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Abstract. We develop a code length principle which is invariant to the 
choice of parameterization on the model distributions. An invariant ap¬ 
proximation formula for easy computation of the marginal distribution 
is provided for gaussian likelihood models. We provide invariant estima¬ 
tors of the model parameters and formulate conditions under which these 
estimators are essentially posteriori unbiased for gaussian models. An 
upper bound on the coarseness of discretization on the model parameters 
is deduced. We introduce a discrimination measure between probability 
distributions and use it to construct probability distributions on model 
classes. The total code length is shown to equal the NML code length 
of Rissanen to within an additive constant when choosing Jeffreys prior 
distribution on the model parameters together with a particular choice 
of prior distribution on the model classes. Our model selection principle 
is applied to a gaussian estimation problem for data in a wavelet rep¬ 
resentation and its performance is tested and compared to alternative 
wavelet-based estimation methods in numerical experiments. 
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CHAPTER 1 


Introduction 


This thesis describes the development of a codelength and model selec¬ 
tion principle for gaussian likelihood models which is invariant to the choice 
of parameterization of the model. We provide an invariant marginal approx¬ 
imation formula and invariant estimators which are shown to be essentially 
a posteriori unbiased under ’’reasonable” conditions on the signal to noise 
ratio and data generating model. An upper bound on the coarseness of 
discretization of model parameters is deduced. Also, we introduce the con¬ 
cept of a model class prior distribution, which enables us to discriminate 
quantitatively in terms of code lengths between different choices of prior dis¬ 
tributions on the parameters that we want to estimate. The model class 
distribution may be interpreted as a quantitive measure of the amount of 
trust we have in our prior information of the data generating process. We 
show in numerical experiments that the choice of model class prior distri¬ 
bution may be of crucial importance to the performance of estimators when 
estimating parameters in additive white gaussian noise. The principle is 
compared to the NML-principle of Rissanen in both theory and numerical 
experiments. 


1. Wavelet-based recovering of data corrupted by noise 

We will rely on the properties of discrete orthogonal wavelet bases [Dau92, 
Mal98b, Wic94] to provide us with a sparse (most coefficients are ’’al¬ 
most” zero) representation of the data sets. Empirical work [ML99] has 
shown that the family of Generalized Gaussian distributions (GGD) may be 
used to provide reasonable models for natural image data when represented 
in the wavelet domain. Wavelets have through the last 15 years been used 
extensively in problems of estimating data corrupted by additive noise (de- 
noising). The wavelet based methods may all be divided into three main 
steps: Given a dataset x G W^, do 

(1) Expand the data x G M” into an discrete orthogonal wavelet basis 

def 

W G by computing the linear orthogonal transform w = 

W^x. 

(2) Process the transformed data w in the wavelet domain to yield w. 

(3) Inverse-transform the processed transformed data w back into the 

def 

original space domain to yield the estimate x = Ww. 
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Several denoising techniques have been developed for processing in the wavelet 
domain, [DJ94, DJ95, ML99, BG95c, Vid98], the differences between 
methods depending on the type data and modeling assumptions. Com¬ 
mon to most wavelet-based denoising techniques are shrinkage-estimators 
operating in the wavelet domain, and among these, threshold estimators in 
particular. The most popular threshold operators take the form; 


Hard threshold estimator; (^x) 

Soft threshold estimator; (x) = 


Firm threshold estimator; {x) = 


J 0, if |a;| < t, 

I X, if \x\ > t. 

0, if \x\ < t, 

X — sgn {x)t, if \x\ > t. 


( 1 ) 

( 2 ) 


0, if |x| < ti, 

sgn {x)t2(\x\-ti) 
*2—ii 

X, if |x| > t2- 


if ti < |a;| < t2, 


( 3 ) 


The generic case studied in the litterature referenced above is that of recover¬ 
ing an unknown function g{t) : [0,1] —> M at sample points 0 < s* < 1, 1 < 
i < n hy providing estimates of the discrete samples 6 = G M"" 

when corrupted by additive white gaussian noise rj G W^. The samples are 
all modelled as independently and identically distributed; 

Xi = g{si) + r]i, I <i <n (4) 


where 


gi ~ AA(0, a), 1 < i < n 
and g is the underlying unknown function; 

g ; [0,1] M. 

At the sample points Si we define 

g{si). (5) 

Hard and soft threshold estimators applied in the wavelet domain were stud¬ 
ied in the work of Donoho and Johnstone [DJ94] and results on universal 
optimality of the estimators were reported; Let J* G {0,1} be the ideal 
diagonal projection operator defined by 

= h\0i\>^} ( 6 ) 

where I is the indicator function. Supposing we have an oracle available 
providing us with the Jj, then the ideal risk 6) — 

0||| of the ideal oracle estimator 




( 7 ) 
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becomes 


= E min(|6»i|,cr)^. (8) 

i=l 

The ideal risk in (8) is in general not attainable by any estimator without 
the aid of an oracle 5i, but the following result on universal optimality of 
the estimator was shown in [DJ94]: 

_ 0 g ^ (21ogn+ 1) ( 9 ) 

where is the soft threshold estimator (2) with threshold tn = (y\J‘l log n. 

Furthermore, the result (9) was shown to be asymptotically sharp in n, and 
that no estimator can come closer to the ideal risk 9) than this 

for all 9 £ when forced to rely on the data x alone. 


These results were extended to the class of firm threshold estimators [BG95c, 
BG95b] and estimates on bias and variances of the estimators have also 
been provided [BG95a]. 

However, the universal threshold tn = ( 7^/2 log n leads to an aggressive 
thresholding scheme on the data x and the resulting estimates 9{x) are of¬ 
ten in experiments and applications found to suffer from oversmoothing and 
loss of details, effects which are especially prominent in image denoising 
applications. Even though the result (9) is universally optimal, in most sit¬ 
uations of practical interest the signal 9 to be estimated is known to possess 
some degree of smoothness and this knowledge may be exploited to provide 
alternative (more sophisticated) wavelet shrinking estimators with better 
performance on this particular type of data. The percieved suboptimality 
of the universal thresholding scheme of Donoho and Johnstone in particular 
cases could be expected, as their result on the universal optimality of the risk 
of the estimator was based purely on their new result in univariate normal 
decision theory, and did not presuppose anything concerning the wavelet 
representation of the data and/or the sparseness thereof. However, several 
minimax results on wavelet shrinkage estimators over wide ranges of Besov- 
and Triebel-type smoothness constraints were reported in [DJ98]. Donoho 
and Johnstone in [DJ95] provided an adaptive hybrid thresholding scheme 
called SureShrink in the wavelet domain which was shown to be nearly min¬ 
imax optimal when the underlying function / belongs to a range of Besov 
spaces. The class of functions / ; [0,1] ^ M of total bounded variation \\f\\tv 
where 

WfWtv sup I ^ \f{si+i) - f{si)\ : 0<si<---<Sn<l, uGnI (10) 
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are found to provide a reasonable class to embed ’’most” natural images 
in, [KM03]. Functions of total bounded variation belong on the scale of 
Besov spaces, [DJ95]. The SureShrink method uses the Stein Unbiased Risk 
Estimate (SURE) [SteSl] separately in each wavelet subband to compute 
the threshold minimizing the SURE-estimate. Renormalizing the data x by 
the noise level a so that x ~ J\f{6,l) and letting 9^^\x) denote the soft 
thresholding estimator with threshold t > 0, SURE states that 

E^||0W(x) - eg = E, SURE(x,t) (11) 

where 

n n 

SURE(x,t) =^n - ^min(|xi|,t)^ (12) 

i=l i=\ 

and the SURE threshold ts is defined as 

ts arg mino<j<t^SURE(x, t), where tn V2 logn. (13) 

To circumvent issues of poor performance of SURE in cases of extreme 
sparsity of the wavelet coefficients, a measure of sparseness of the wavelet 
representation of the data is computed in each subband, and if the represen¬ 
tation within the subband is sparse ’’enough”, the universal soft threshold 

def 

estimator is used, otherwise the threshold t = min(t 5 ,t„) is used, thus 
making the method a hybrid between two different thresholding schemes. 
This method has a fast 0(n log n) implementation. Moulin and Liu [ML99] 
found (empirically) the family of Generalized Gaussian Distributions (GGD) 
to be able to provide reasonable model distributions for the probability den¬ 
sity distributions (pdf) of wavelet coefficients 9i of natural image data, and 
estimators for different GGD distributions were investigated. Results from 
similar work were reported in [CVOO]. 

2. Model selection, code lengths, prior information, invariance 

We briefly outline the connection between model selection, probability 
distributions and code length principles, for a thorough presentation on the 
theme we refer to [CT91, Ris98]. Let X be a discrete random variable 
with range A (finite or countably infinite) and pdf p{x). Let C{x) denote 
the codeword used to encode x G yl in a binary representation and let L(x) 
denote the length (number of binary bits) of the codeword C'(x). The ex- 

def 

pected length L{C) of the code C'(x) is then defined as L{C) = Ep{L{x)} = 
Y1x&aP(.^)■ Furthermore, let x” (xi,X 2 , ■■■,Xn) and define the code¬ 
word C{x^) C'(xi)C'(x 2 ) • • • C'(xn) where C'(xi)C'(x 2 ) • • • C'(xn) denotes 

concatenation of codewords. We only want to consider decodeable codes 
C, i.e. codes C where Xj / Xj ^ C{xi) / C{xj). An important class of 
such codes are the prefix codes which have the defining property that no 
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codeword is a prefix of any other codeword. Any binary prefix code C{x) 
with codeword lengths L{x) satifies the Kraft inequality. 

(14) 

x&A 


and conversely: For a given set of codeword lengths L{x), x £ A satisfying 
(14) there exists a prefix code C{x) with codeword lengths L{x), [Ris98]. 
Then we note that for a given a pdf q{x) on x G A we may define codeword 

lengths Lq{x) — log 2 q{x) and we then have 

2 -LA^) = Y ^ q{x) = 1 (15) 

x£A xGA xSA 


and conversely for given code C'{x) with codeword lengths L'{x) we may 
define a pdf r(x) on x G A by 


r(x) 


del 




(16) 


Let the entropy H{X) of the random variable X with range A and pdf p{x) 
be defined by 


H{X) ^ p{x) log 2 p{x) 


(17) 


then the following inequality holds for any prefix code C 


L{C) > H{X) 


(18) 


with equality if and only if L{x) = — log 2 p(x),Vx G A, [Ris98]. 


That is, a prefix code C with codeword lengths L{x) = — log 2 p(x) is an 
optimal code in the sense that it minimizes the expected codeword length 
L{C). Assume x” a data set given to us and let the model class M. = 
{Ml,M 2 ,...} be a set of models used to explain the data set x”. We may 
then construct a binary encoding scheme resulting in binary descriptions 
of both the model Mj in question and the data x"^ in view of this model. 
In analogy with above notation, we let L{s) denote the length of a binary 
description of an object s, we may write 

L{x^, Mi) L{x^\Mi) + L{Mi). (19) 

We will use both the terms code length and description length of the data to 
mean the length of the encoded binary string representing the description 
of the data x^. Because of (15), (16) we may restrict to considering code 
lengths and probability distributions rather than (the construction of) codes 
themselves. We use the term code length principle to denote the method of 
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assigning a code length to a dataset x"' and the model Mj used to explain 
this dataset. A good model Mi is one that leads to a short total code length 
L{x'^,Mi). The term minimum description length refers to the principle of 

def 

choosing the model M* = arg , Mi) as the model to be used 

to explain the data, that is the model providing the shortest description of 
dataset and model together. 

Different code length principles have been proposed in the litterature, we 
will here point out two; The Minimum Description Length (MDL) principle 
(and in particular; the Normalized Maximum Likelihood (NML)-principle) 
of Rissanen [BRY98, Ris98, RisOl] and the Minimum Message Length 
(MML) principle of Wallace [WF87, OH94, OB94b]. The two principles 
are similar, but distinct, for a discussion of differences see [OB94a, LanOl]. 
An important difference between these two principles stem from different 
views on the role of prior information on parameters. The following two 
citations provides some information on the MDL-view as Rissanen sees it; 

”(...) the suggestion that the (prior) distribution 7 r{9) (of parameters 9) cap¬ 
tures prior knowledge in an adequate manner is untenable and even totally 
unacceptable to many because of the interpretation difficulty whenever the 
parameter appears to be a contstant-albeit unknown. (...)”, [Ris98], page 
10 . 

And furthermore; 

”(...) In our view the parameter 9 is generated by our selecting the model 
class, and it has no other ’inherent’ meaning. (...)”, [Ris98] page 55. 

On the other hand the MML-philosophy in the view of Wallace/Freeman 
states; 

”(...) there can be no substitute for careful specification of whatever prior 
knowledge is available (...)”, [WF87]. 

Our own opinion in this issue on the role of prior information and prior dis¬ 
tributions on parameters is not quite as clear cut as in the statements cited 
above, but we may at least say this; On one hand we want to exploit and 
make the most of any prior information we have on the distribution of the 
noiseless data 9 to help in providing a good estimate 9*, on the other hand 
we do not want to state claims on the prior distribution of the parameters 
9 which are too far from the ’’truth”, whatever that may be. Introducing 
parametric prior distributions has a (heavy) price; It leads to the problem of 
providing ’’sensible” estimates of the parameters of the prior, a very difficult 
task in many cases, as indeed we experienced when applying our models and 
theory on the real world in the numerical work presented later in this thesis 
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(the problem we experienced was basically of the kind of overfitting model 
to the data). This experience motivated us to introduce and construct a 
prior distribution on model classes; The model class prior distribution en¬ 
ables us to discriminate quantitatively between different choices of prior 
distributions on the model parameters 9 which parameterize the likelihood 
function. It may be used to provide a theoretically well-founded way of 
quantitatively penalizing over-optimistic judgements of robustness and/or 
’’truth” and ’’reasonability” of prior knowledge of the data generating pro¬ 
cess as compared to some carefully chosen reference prior distribution. By 
careful we here mean that the chosen reference distribution should be not 
too informative, and (ideally) not too non-informative either. 

Also, we note that once (a prior distribution 7r(0) on) parameters 9 
are introduced, the question of invariance [Bal96, Bal97] arises: For given 
likelihood distribution f{x\9), define the Fisher information matrix F{9) by; 

F{9\, -E^^^^\ogf{x\9), ( 20 ) 

observe that 

m{x) [ f{x\ 9 ) 7 r{ 9 ) d9 = [ ;@1^|F(0)|V2 d9 (21) 

Jeee Jeee \F{9)\'-/ 

and define 

dV{9)^^^\F{9)\^/^ d9, (1.(0)"=^'-log (22) 

then note that the MML estimator is defined [OB94a] by 

^liML arg minege^(^) (23) 

and note that the integration measure dV(9) is the Riemannian volume 
element which provides a reparameterization invariant integration measure 
on the parameter manifold 0 on which 9 lives and furthermore; The choice of 
Jeffreys distribution |F(6*)|^/^/ f |F(/3)|^/^ dfd as the prior 7r(6*) is equivalent 
to assuming equal prior likelihood of all distributions parameterized by 0 G 
0 as opposed to equal prior likelihood of parameters 9, [Bal96, Bal97]. 
This choice of a non-informative prior distribution is what we will use when 
comparing our code length principle to the NML-principle of Rissanen. 


3. Connecting code length principles to wavelet-based denoising 

The observed ability of wavelet bases to provide sparse representations 
of several types of real world data sets of interest in diverse research fields 
(mammography, medical imaging, seismic data analysis) combined with re¬ 
sults from the extensive empirical and theoretical research on properties 
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of the wavelet expansions of data belonging to certain smoothness classes 
(Besov-scales, bounded total variation classes), provides information which 
may be exploited in building models, model selection and code length prin¬ 
ciples, for example in guiding the choice of prior distribution on the wavelet 
expansion coefficients of a dataset. 

As pointed out in [DJ94], the wavelet thresholding methods described 
previously may be viewed as model selection methods which pick a subset 
of the wavelet basis vectors and fits a model to the data by optimizing some 
given criterion. In the case of the universal thresholding estimators 

the criterion is the least squares method. In [Sai94] a data adap¬ 
tive model selection method for denoising data corrupted by additive white 
gaussian noise was developed by using the Minimum Description Length 
(MDL) principle of Rissanen [Ris96, Ris98] as the criterion to be opti¬ 
mized. The resulting denoising method consisted of thresholding the data 
in the wavelet domain with a hard thresholding estimator with a 

data driven threshold t. However, the model selection principle presented 
in [Sai94] was generally found in numerical experiments to result in large 
thresholds yielding very small models but also a large degree of smoothing 
in the estimated data. The explanation for this lies in the crudeness of the 
coding assumptions made in this work: A constant budget of log 2 n (n is 
sample size) bits per wavelet coefficient included in the model was allocated 
for encoding the location of the coefficient inside the vector of wavelet expan¬ 
sion coefficients of the data, leading to an extra codelength term of d log 2 n 
for model size d. This encoding of location of coefficients is in our view re¬ 
dundant in this case, as the rule for optimally selecting wavelet coefficients 
to include in the model is inherent to the model selection principle by simply 
minimizing the codelength for given model size d. In fact, it was shown in 
[CRM98] that for given deterministic noise variance (that is a is given 
prior to the selection of the model), the coding assumptions in [Sai94] leads 
to a hard thresholding scheme with threshold t = a\/3 logn which is seen to 
be larger than the universally optimal thresholds tn = cr\/2 log n of [DJ94]. 
In [RisOO] a MDL-based denoising method for data corrupted by additive 
white gaussian noise was deduced, resulting in a hard thresholding scheme 
with data driven threshold Imol- Furthermore it was argued that under 
reasonable and rather weak assumptions on the asymptotic (in sample size 
n) behaviour of the dataset x, the threshold tMDL ~ l \/log n where 
a ml is the Maximum Likelihood estimate of the noise deviance a. Another 
MDL-based (subband-dependent) method for simultaneous denoising and 
compression of image data in the wavelet domain was presented in [HY00]. 

The model selection principle we will develop is based on minimizing the 
description length of the model and dataset when encoded in the binary code 
induced by our modelled marginal distribution m{x) and a suitable model 
class prior distribution defined on the set of model classes in question. We 
will approximate the marginal distribution m{x) in (21) as follows: 
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(1) Construct a reparameterization 6 = with the property that 

the reparameterized Fisher information \F{6)\ is constant. 

(2) Use if) to reparameterize the marginal integral (21). 

^ ^ def 

(3) Letting <h(0) = <h('0(0)), expand the reparameterized marginal 

/V def 

integral around the MML estimate = arg minggQ$(6*) by 

Taylor-expanding $(0) around 

(4) Truncate the expansion of the integral to second order to yield the 
approximated marginal rh(x). 

Note that the approximation m{x) of the marginal integral (21) that results 
from the method outlined above is invariant, in that it does not depend 
on our original more or less arbitrary choice of parameterization 6 . This 
independency of the approximation fh{x) of parameterization 9 would in 
general not be the case (unlesss our original choice of parameterization 9 
was lucky enough to yield |T(0)| = constant) if we simply approximated the 
marginal integral directly by expanding f{x\9)TT{9) around the maximum 
posterior estimate 9\jj^p. 

Under some ’’reasonable conditions” on the data and prior distribution 
7r(0) which will be stated precisely later, we will show for gaussian likeli¬ 
hood models that the described second order approximation of the marginal 
integral has small error, and that the MML estimate is ’’very close” 

to the posterior mean 9'^ 

9^ = -^[ 9f{x\9)7r{9) (19 (24) 

m{x) 

that is 9\jj^p is essentially unbiased in a posterior sense. Furthermore, we 
will show that, for a gaussian likelihood and choosing Jeffreys distribution 
both as the prior distribution on the parameters and as the reference prior 
distribution for the model class distribution, the code length of the model 
and dataset is to within an additive constant equal to the NML code length 
developed in [Ris96]. 


4. Organization of thesis 

In the second chapter (following the current chapter) we define the prob¬ 
lem to be studied, describe the modeling assumptions, provide the necessary 
preliminaries on notation and theory and present the main theoretical results 
on the formula for the modelled data generating distribution. We present 
the development of the model class prior distribution and a result on the 
coarsest possible choice of discretization of model parameters in a posterior 
perspective. The longest and computationally tedious proofs are put in the 
appendices to which we refer when appropriate. In the third chapter we ap¬ 
ply the theory to the practical problem of denoising data in white gaussian 
noise and we present the results from our numerical experiments on the per¬ 
formance of our method. In the fourth chapter we extend our method to a 
case of non-white noise and present results from our numerical experiments. 
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5. Contact information and documentation 

I may at the time of this writing be reached on the email addresses: 
eirikf@math.uit.no and: efossgaard@gmail.com. The code (Ansi C) 
developed to implement the theory in this thesis in the reported numerical 
experiments may be downloaded from: 
http://www.math.uit.no/users/eirikf/. 
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CHAPTER 2 


Development of an invariant code length principle 


1. Definition of problem and data generating model 


Let M” be euclidean n-dimensional space equipped with the euclidean 
inner product (•,•) : M” x M"" —> M. Given data x = G M" 

modeled as 

X = d + r}, (25) 

where 6 = G is signal and rj = (r/i,ry„) G is noise, 

our goal is to estimate 6 . We think of 0 as the sampled projection of 
some unknown real valued function u : MP —> M, u G X, for some func¬ 
tion space X, onto some n-dimensional orthogonal basis W spanning a n- 
dimensional subspace V C X. We will assume X is some ’’sufficiently nice” 
subspace of which members possess some degree of smoothness. We 

model the noise coefficients {? 7 j}r=i independently, identically distributed 
(IID) with mean zero, variance and gaussian density function /. Thus, 
the data {xi}f=i are independently distributed (ID) with Xi ~ /(xi|0j,r) 
where 6 = ( 6 * 1 ,..., 0 „)^ are the mean values of the data Xi, i = l,...,n, 
is the variance of each Xi and / is a gaussian likelihood function. De¬ 
fine f{x\d,T) f{xi\0i,T)---f{xn\0n,T). Only d < n of the parameters 
are considered to be free nonzero parameters which we are able to 
estimate ’’reasonably” accurate under the modeling assumption (25) and 
we will model these d parameters as independently identically distributed 
(IID). Thus the set of parameters 6 G is a d-dimensional submanifold 
Qd of K”. In coordinates 9i this may be expressed by a binary index vector 
Id = ( 7 d(l)) 7 d( 2 ), •••, 7 d(^)) e { 0 , 1 }"^ where 7 ^ has exactly d nonzero ele¬ 
ments. We define 9i to be a model parameter if and only if 7d(i) = 1. Then 
we may write a prior density 7 rx(9i) on the form 


Trx(9i) 


hx(9i), if 7 d(i) = 1 
g(9i}, if jd(i) = 0, 


(26) 


where hx is some probability distribution parameterized by A centered in 
origo (zero first moment) and equals the second moment (deviance) 

and g is some density. We will restrict ttx to the class of priors which 
are everywhere smooth except possibly at the origin. We extend hx,g to 
densities on and respectively by assuming independence of the 

{9i}'^^i. It is in most cases more difficult to have a clear a priori idea of 
what a suitable prior distribution ?(r) on the parameter r should be. For 
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reasons of simplicity in the computations to come, we will restrict the prior 
distribution ?(r) on r to be the uniform distribution 

?(r) = y T £ Ir C (0,oo) (27) 

where Ir is some bounded interval. We may reorder the index i indexing 
6 = { 01 , so that 7 d(i) = 1 if and only if 1 < i < d and zero otherwise. 

We reorder the data x = {xi, ...,XnY' by the same reordering performed 

on the parameters We define 0|j { 6 i, ...,9d,0j) £ M”, 6 ± 

(0^,6'd+l, ...,6'n) e IK”, ®|| {xi,...,Xd,Oj) G M”, X± {0'^,Xd+l,...,Xn), 

where 0i is the zero vector in and O 2 is the zero vector in We have 

then the orthogonal decompositions x = x\\+ £c_l, 0 = Ox and through 

the set of model indices 7 ^ and the basis W we get an induced orthogonal 
decomposition V = Vji © hj_. We model g = 5, where 6 is the Dirac delta 
distribution, implying 0 _l = 0 and thus 6 = and 

nx{ 0 ) = 5*hx{9) = hx{ 0 ). (28) 

To set up the proper definition of the marginal integral, a few words must be 
said on the status of the parameters r, A, i.e whether we consider them to 
be deterministic parameters dehned prior to (and independent of) selection 
of model 7 ^, or stochastic parameters depending on the model 7 ^. In the lit- 
terature on model selection applied to denoising there are examples on both 
approaches [HYOO], [RisOO]. We will here always consider the parameter r 
stochastic, uniformly distributed over some bounded interval Ir C M_|_, and 
its estimator t* to be determined in conjunction with the model 7 ^. As for 
the parameter A we have deduced results on the marginal distribution for 
both cases. We will in the experiments section consider A to be stochastic 
and (for reasons of computational simplicity) uniformly distributed on some 
bounded interval. The estimator A* will therefore depend on the model 7 ^. 
For now, however, we consider A deterministic. 

Given Ir and A we define the marginal density m^^{z\Ir, A) by 

A) / f{z\e,T)Tix{0)dedT. (29) 

brl Je£R‘^,T^Ir 

We note that if A is considered stochastic, that is we consider it unknown 
to us prior to the model selection process, the integral in (29) should also 
include an integration over a bounded A-interval I\ C M+. This is discussed 
in detail below, see Proposition 5.1 and Corollary 5.1. The subscript 7 ^ in 
is used to emphasize the dependence of the marginal density on the 
selected model indexed by 7 ^. We consider m.y^ to be the data generating 
distribution in our model for the data, though it is not necessarily, and in 
most cases not, the true data-generating distribution q, say. 
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2. Outline of motivation and strategy 

We will exploit the compression abilities of wavelets and wavelet packet 
bases on broad classes of natural signals and images to provide a sparse 
representation of the data in some (possibly data driven) wavelet domain. 
This will allow a smaller data generating model (smaller model size d) and 
thus a more compact description of the data itself, parameterized by 9 and 
r. This will be essential to our use of the Minimum Desription Length 
Principle (MDL Principle) in constructing a posteriori unbiased estimators 
9'^{x), T^(x). The transforms we will consider are orthogonal transforms 
of wavelet-type. Let W G be some discrete orthogonal basis of M” 

consisting of discrete wavelet packet functions. We will consider the given 
data X to be the finest scale wavelet coefficients of the data available to us, 
so that : M” —> is a linear orthogonal operator on Define 

W'^x, 0^ W^9. For notational simplicity we will drop the 

superscripts w, and assume that x and 6 are data and signal expanded in 
some fixed suitable basis of wavelet type. 

As we will se below, for many choices of ’’realistic” prior distributions 
for the parameters, our models will result in estimators 6* belonging to 
the class of thresholding estimators as have been described in [DJ94] and 
[BG95c]. Threshold estimators are MAP-estimators for the class of GGDjy 
priors with shape parameter 0 < u < 1 as demonstrated in [ML99]. It is 
known from [DJ94], that the MSE universally ideal (meaning optimal over 
all 9 G M”) threshold value tn grows like a\/2 log n as n ^ oo where n is the 
sample size and a the noise deviance. Furthermore, note that the formula 
tn ~ \/2 log n for the MSE ideal threshold value only applies for large n. 
Eor smaller n on the order of a few hundred the MSE optimal threshold 
values are significantly smaller than ay/2 log n, and this remains true for an 
even larger range of sample sizes n for the lower threshold ti in the firm 
threshold estimator (3). The performance of the estimators in (l)-(2) when 
using the universal MSE optimal threshold values tn is often found not to 
be satisfying on several types of natural data encountered in problems of 
applied nature in that it leads to too much smoothing in the estimates. 
This lack of performance is mainly due to the fact that the universally 
optimal MSE value of the threshold t is too large, in other words t grows 
’’too fast” with increasing dimension n of the dataset. Several refined/data 
adaptive threshold schemes as in [BG95b, DJ95, CVOO, ML99] have 
been suggested. We will use model selection in a wavelet basis to determine 
the relevant dimension d < n of the dataset in this basis and the compute 
the resulting data adaptive estimators 9^ and We will seek to derive a 
model selection principle and estimators 9^, which are invariant to the 
choice of parameterization of our models. 

The rationale behind the idea of decomposing the data x into x = x\\ + 
x±_ is the observation that the part of data x consisting of signal is efficiently 
compressed, meaning it can be accurately represented in the sense of small 
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squared loss by a small subset of its expansion coefficients in a wavelet- 
type basis W, whereas the noise is essentially not compressible in this type 
of basis. Thus, to some extent it is possible to choose the space V\\ so that it 
contains most of the signal and therefore the space V± will contain mostly 
noise. We will make use of the Minimum Description Length Principle 
[Ris98], [BRY98] to determine the ’’best” signal subspace V|| of the space 
V where Vjj = Spanj...^^(-j)^;^{rOj} and is some subset of the 

column vectors of the full basis matrix W = 


3. Definition of the model class 

We need to know how to determine £C||. As mentioned above, the mar¬ 
ginal density is likely not the true data generating distribution q. De¬ 
pending on to which extent is able to approximate g, we can expect 
to approximate q more or less closely in the space of probability dis¬ 
tributions by optimizing the choice of the model index vector 7 ^ under the 
modelled data generating distribution (29). Beyond some subset of param¬ 
eters of size d' < n, it may be meaningless to try to estimate 

more parameters as these parameters do not capture more of the properties 
of the unknown underlying true data generating distribution q. That is, 
further adding of parameters to our model will result in overfitting to 
the specific dataset x at hand, [Bal96, Bal97]. 

With this in mind, for given likelihood distribution f[x\6,T) and prior 
distribution let denote the class of all models with d nonzero 

parameters 6i as dehned by index vectors 'ya G {0,1}"^. Because each index 
vector 7 (i index a different data generating distribution we will say 

that Md is a model class for the modelled data generating distribution 
There are (^) ways to pick k elements out of n elements. Therefore the 

number of distinct models inside each model class is (^). Letting Ai 
Ufc=o denote the collection of all model classes under consideration, we 
have \Ai\ = \^k\= ® ~ This yields a total of 2"^ different 

models. 


4. Invariant Laplace-approximation of marginal density 

We will in this section develop a theory of a parameterization invariant 
approximation of the marginal distribution m^^{x) by expanding the defin¬ 
ing integral (29) about certain points 9* and t* . We will start with the 
simplest case where we have complete knowledge of the prior distribution 
Trx{9), that is we know all its parameters. The result is shown in in Theorem 
4.1. Then we proceed to the case where an estimate of the parameters of 
the prior distribution has to be estimated from the given data set x. 

The result is shown in Corollary 5.1. 
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Definition 4.1. The Fisher information matrix F{(3) for a likelihood 
funetion /(a;|/3) parameterized by parameters /3 = (/3i, is defined as 

F{I3) &-E^i^±logf{x\l3) |. (30) 

As explained in [Bal96] the Fisher information matrix induces a metric 
on the Riemannian parameter manifold in the space of distributions param¬ 
eterized by /3 and this metric is invariant to smooth transformations of the 
parameter vector /3. We have therefore the following result: 

Proposition 4.1. The integration measure dV{(3) = \F{(3)\^^‘^dp is a 
reparameterization invariant integration measure on the parameter manifold, 
where IF"! denotes the absolute value of the determinant of the Fisher matrix 

F. 

'' '' def 

Proof. Let /3 = define a reparameterization of (3 with g{z\P) = 
f{z\if{f3)). The volume element dV{j3) in the reparameterized system is 
dV{P) = |Jv>(/3)^ F{'il){P))J,p0)\^/‘^dp, where Jy,{$)ij is the jacobi 

matrix of the transformation xjz. Then the prior density tta transforms as 
7rx{l3)dp Px{0)d0 under (3 i/>(/3) where p^(/3) = 7i:x{if0))\Jy,{0)\- 

We have to show that dV0) = \F0)\^/‘^d0 where F0) is the Fisher 
information matrix of the likelihood function g. We observe that 

log f{z\'if0)) \ 

dMj I ’ 

by the chain rule we have 


4 (/ 3 ) *^= 1 ^ -F, 


log g(z0) 
dPidPj 


= -E, 


a^iog/(z|/3)aiog/(z|/3) d^/3k \ 

fc 9(dk dPidPjj' 

now it is easily verified that | } = 0, y k, and what re¬ 

mains is the ij element of the matrix J^0Y'F{(3) Jy,0), thus we get 
dV0) = |J^(/3)^F(/3)J^(/3)|V2 d$ = |F(/3)|V2|j^(/3)| d$ which proves 
the invariance of dV (/3) to smooth transformations of /3. □ 



( 32 ) 
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where dV{P) is the reparametrization invariant integration measure dis¬ 
cussed above. We have the following result 

Proposition 4.2. The integrand is invariant to repa- 

rameterizations /3 = 'ip{(3). 


Proof. To see this, simply observe that 


9{z\(3) 


P0) 

F(/§)|V2 


/(^l^(/3)) 




= f{z\P) 


F(/3)|V2- 


(33) 

□ 


We note that — log (/(z|/3)7rA(/3)/|F(/3)|^/^) is, up to terms not depending 
on data z or parameters /3, the same expression one seeks to minimize in 
estimator and model selection by the Minimum Message Length (MML) 
principle in [OB94b]. 

We may now proceed to calculate the integral in (29) by a Laplace 
method which is invariant to reparameterizations. The Laplace method 
for evaluating marginal densities was investigated in [TKK89], [TK86], 
[KTK88] in the univariate case which may be straightforwardly extended 
to the the multivariate case of IID variables whereas in our case we face the 
problem of evaluating the marginal density in the multivariate case of ID 
variables which are not identically distributed, e.g different means {E{zi} = 
Oil 1 < i < d). Using the notation and definitions from above, we write 


m^^(z|/^,A) = ^ / f{z\e,T) 

Pt\ Je&e.T&i-r 




1 


IT 


r| J 


060,re/. |i"(0,r)|V2 

exp [—4>(z, r, 0)] dV{0,T) 

7^x{0) 


dV{e,T) 


where — <I>(z, r, 0) log 


fiz\0,T) 


|F(0,r)|V2 


(34) 

(35) 

(36) 


and define the invariant MML-estimators by 


e* = arg min 4>(z,r,0), t* = arg min 4>(z,r,0), (37) 

assuming the existence of extremal points 9* and t* where §§( 2 :, r, 0)|g_g* 
= 0 and ^{z, r, 0)|^_^* = 0. It suffices that <h(z, r, 6) is a convex function 
in each of the parameter arguments r and 9i, i = 1, ...,d. If we knew the 
exact form of the integration measure dV{6,T), we could approximate the 
marginal density m^^{z\Ir, A) by expanding the integral (35) around 9* and 
T* up to some order in 9 and r. However, when doing such an expansion we 
want ’’low order asymptotic convergence” of the expansion series, to avoid 
both complex computations and complex resulting formulas possibly difficult 



4. INVARIANT LAPLACE-APPROXIMATION OF MARGINAL DENSITY 


21 


to analyse and implement. By ’’low order asymptotic convergence” we mean 
that second order Taylor approximations of <1> in (35) will be ’’accurate 
enough” for our purposes in the sense that asymptotically in the sample 
size n, our low order expansion of the integral will converge ’’sufficiently 
fast” to the exact value of the integral. We will define ’’accurate enough” 
and ’’sufficiently fast” later. This ’’low order asymptotic convergence” may 
be difficult to achieve in arbitrary chosen parameterizations 6, r. Also, the 
result would depend on our more or less arbitrary choice of parameterization 
of the distributions / and ttx in the first place. On this background we seek 
a reparameterization t ^ t and 6i ^ 9i, 1 < i < d yielding dV{6,T) 
dV{f,0) = vq dO df where [uo]^ is some positive real constant number. To 
construct such a reparameterization we will limit our investigation to the 
case of a gaussian likelihood /. We then write 


f{z\0,T) 



(38) 


Let r be some real positive dimensionless constant number and let tq be 
some real positive constant with [ro]« = [r]tj. We choose 


T = ^(f), V)(0) = To, 9i = 4>{9i,T) ^ 172 ^* 



1 < i < d. 

(39) 


The Fisher matrix F{0,f) then evaluates to (see appendix) 

2 


F,Ae,f) = { 


1 dip{T) 
?/)(f) dr 


n I T 

2 4 2L^k=l 


r, 

_l-y, 1 diPir) 
2 3 ) dr 

2 ^ 


if i = j = 1 , 

iii = j, 1 < ij < d, 

if i = 1, 1 < j < d, 

if j = 1, 1 < i < d, 

else. 


(40) 


As shown in the appendix, the determinant of F{9, f) as given in (40) above 
evaluates to 


imui = 


/ 1 d^{f) 
df 


2 


(41) 


Now, our choice of reparameterization in (39) implies 9i, 1 < i < d and f 
are dimensionless parameters. Therefore we may put \F{9,f)\^^‘^ = 
where r is some positive real dimensionless number. This gives us together 
with (41) and the initial condition in (39) the equation 




1/2 


df 


V^(f), '0(0) = To. 


(42) 
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We choose the plus-sign in (42). This choice implies no loss of generality, as 
it is only a matter of sign convention on the parameter f. Solving (42) then 
gives 


^(f) = To exp 




(43) 


where f, r, r are dimensionless numbers and [ro]tj = [r]„. For notational 
convenience we define 


def / r V 

ed = {- 


and 


. def 2 \ 2 
— 


n 


(44) 


(45) 


Define 


d(z,f,0) - log 


g{z\e,T)p{e) 


|F(0,f)|V2 
/ TTX [(f){e,T)) |J 0 ,^| 


= — log , 

|jJ_^F(0(0,f),V’(r))J<^,^|V2 

where we used (33). Furthermore, define 


= $(2;,V’(t),0(0)) 
(46) 


e*{z) =^arg min d(z,f,0), t*{z) arg min l•(z,f,0) (47) 

where we have assumed 4>(z,f,0) is convex in each of its parameter argu¬ 
ments f and 6i, i = 1, ...,d, thus the existence of f* and 6* is guaranteed. 
The integral in (35) defining m^^{z\lT-, X) may then be rewritten as 

m.y^{z\Ir, X) = / exp 4>(z, f, 0)') dO df. (48) 

Now, our constructed reparameterization above will provide us with the 
necessary means for approximating the marginal m^^{z\lT, X) to sufficient 
accuracy by a second order approximation which is invariant to reparame- 
terizations. 


Theorem 4.1. (Invariant second order approximation of marginal den¬ 
sity) Let X G he the given data set under the model (25) and an index 
vector G {0,1}"^ of model indices with d nonzero elements, 0 < d < n. 
Let 6* and t* be the invariant estimators defined in (37). Let Ir C (0, oo) 
be a bounded closed interval eontaining the MML-estimate t*. Let f{x\6,T) 
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be a gaussian likelihood function of data x and 7rx{9) a prior density on the 
parameters 0 G with a given variance . Let x = + x±_ he the 

orthogonal decomposition of the data induced by the selected model jd of size 
d. Let F{9 ,t) denote the (d + 1) x (d + 1) Fisher matrix of the likelihood 
function /(£c|0,r) with respect to parameters 9, r. Let H{x,9,t) denote 
the (d + 1) X (d + 1) Hessian matrix of 'K\{9)f{x\9,T) \F{9 ,t)\~"^, let Pq 
denote the Gaussian distribution function. Then the marginal A) 

defined in (29) may be expressed as follows: 


/ \ 1 

my^{x\Ir,X) = f{x\T*,9*)7rx{9*)\Ir\~^x 


\H{x, T*, 9* 


IIPg({t*)-^\9*\{1 + o 


U=l 


{1 + 0 (a )}{1 + C} 


(49) 


The formula (49) applies under the following sufficient conditions: 


(1) (Shape of prior) ttx{0) = C • X^ exp(—/ i(A 2 0)), some constant C > 0, 
X > 0, where h is an integrable, symmetric function of 6 such that 

lim h{X^9)=QO. (50) 

| 6 »Hoo 


(2) (Heaviness of tails, integrability and smoothness on the prior) There 
exist constant real numbers < v < 2, B( < By, Cy > so that the 
inequalities 


B'y < h{X2 9) < By + Cy 


X^O 


V [0]v G M, I <i < d, 


(51) 


and 



[5^ . 1 J 


- Qk 

1 I' 

,1 

0 < 


< Cy 

'J 

d9>^ 

A20 

- V 


hold for V \9]v G M, and for all 1 < k < oo, 1 < i < d. 


(52) 


(3) (SNR and model size) The number 0 < C < 1 defined by 
C sup ^!^Cyn\n - l\ 011 (A,r*)) " < 1 


where LI{X,t) is the signal to noise ratio (SNR) 


Q{X,t) 


def dX ^ 
nT~^ 


(53) 
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(4) (The size and location of the interval Ir) The interval Ir satisfies 


T* elrC 


exp 


/ log^ A^(A,p,7d) \ 2 

V N{X,u,-fd) ) 


, T* exp 


/ 21 og^ A^(A,p, 7 d) \ " \ 

V ) ) 

(54) 


where 


n—d-\-2 
2 ’ 


i/ 0 < p < 1 

ifl<v<2. 


We add that if the conditions (l)-(6) listed above are satisfied, one may then 
show the following bounds on k and 

(5) (The approximation error O {k) from the Taylor terms above second 
order) |k| may be bounded from above by 


3 (SSi(r-,A)) = 




i=i 


exp (p*ie*p 


+ 


+ 


1 


E 


t*{9 




N{X,v,-id) ^^gxp(ir*(0*)2 


( 2 vr)- 


Id) 


^ T*(x\p)-\9X)(x\\{fi)-\Q*^ 


exp I 


( 0:)2 + ( 0|)2 


(55) 


(6) (The contribution ^ from the integral o/exp(—<I>) over \ {Uj=i ‘S'i} 
where Si is the ’’quadrant” of Mfi containing 9*.) The number ^ may be 
bounded from above by 

jj|l+ 2Pg (-rf |a3||(i)|") sup7rA=i(t)/7rA=i (uq (px^fi) 


i=l 


teR 


inf 


1 + erf [ Tf\xM 




vrA=i(i) 


7rA=i no rfxp) 


sup 


.^LAr7„(0) 

(271)2 


7rA=i(t) 


n -1 


7rA=i no rfxfii) 


(56) 
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if 


7rA=l Uo T{X\\{1 


> 


2C„u 


where ti G Ir, uo{s) = (^r 2 (A,T)) ^ and 


-L^ T^xu{i) , y iG-fd 


def 


1 

T2X 


u-1 






1 + 


1 

T^X 


(SSi(Lr)) 


if 0 < n <1 
ifl<v<2. 

(57) 


Proof. A proof is provided in the appendix. □ 

The invariant approximation of the marginal density m^j^{x\Ir,\) in 
(49) may now be fed into a code length principle to yield a best model 
size estimate d* and the best model 7 ^* for a given dataset x G M”. This 
will yield a model selection principle invariant to reparameterizations in the 
sense explained in above sections. 


5. Generalized Laplace-approximation of marginal density 

We now proceed to the case where the variance parameter of the 
prior distribution is unknown to us. We will then have to estimate 

the parameter A from the given data set. This implies that the density 
m^{x\lT-,\) as written in (49) in Theorem 4.1 is not the marginal density 
for the data x as it contains the data dependent parameter A. We must 
integrate out the parameter \ G I\ from the formula in (29), that is the 
marginal m^^{x\lT-.,I\) now becomes: 

m^^{x\lT-, Ix) 17 —j- [ f{x\0,T)Tr{0\X)l{X) dO dr dX (58) 

\'t\ Je£R‘i,T£l-r,\£l\ 

where 1{X) is a prior distribution on the parameter A. Let Ix C be a 
bounded interval, we model A as uniformly distributed on Ix, and identically 
zero outside Ix, that is 


/(A) = 


if A G Ix 
0 , otherwise 


(59) 


Now, by means of Theorem 4.1 we may write: 


m^^{x\I-r,Ix) 


1 1 {27r)^f{x\e*,T*) 

I I7aI \H{x,6*,T*)\^ 


(i + o(A))(i + e)x 
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nPG((r*)^|0:i{l + o(C)}) [ 7r{e*\X)dX 

^ ^ JAe/x 


where we have ignored any dependency of k, 6 * , t* on A in the integration 
interval Ix. This assumption will hold if we choose the location of the interval 
Ix properly and its width small enough as may be seen by examining the 
proof of Theorem 4.1. We will preserve parameter invariance by following 
the same procedure of invariant Laplace-expansions as in sections above by 
expanding the desired integral in (60) about a certain point A*. We need 
some definitions. Define 


E{X) -Ee 


and define 


^{e,x) = -iog 


■logvr(0|A) 


7r(0|A) 


|t;(a)|: 


and define 


A* = arg in4>oT(0, A). (63) 

We have the following result; 

Proposition 5.1. Let 7r(/3|A) = nf=i^(/5*l^) a density on /3 G 
with variance X~^. Let E{X) ='^ —iH/g log7r(/3|A)|, let 'I'(/3, A) ='^ 

— log and let X* arg infxyQ^{l3, X). Let Ix C M+ be a bounded 

_|£;(A)D J 

interval such that X* £ Ix- Then we have: 


where 


I ^(^|A)dA = Shlh^{l + OH}, ( 64 ) 

'as/a |Taa(/3,A*)|2 


def |'hAAA(/3, A*)I 


\'fxxif3,X*W- 


VdJ- 


Furthermore, the formula (64) is invariant to reparameterizations of the 
distribution tt. 

Proof. We define a map y : A ^ A such that 


A = X(A), x(0) = Ao and E{X) 'll* -Eq 


log7r(0|x(A)) =A-2 gR+ 
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where A > 0 is some constant number. The result follows by computing the 

Taylor-expansion T(A) of T(/3,x(A)) in A about the point A* 
and approximating the integral 


[ exp(-T(/3,A)) |£;(A)|^/2 ^A = A-i / exp (-T(/3, A)) dA 

Jx&ix jasT ^ ^ 


= A 


-1 


exp 


dX 


(67) 


to second order in A. 


□ 


Using the result in (64) together with Theorem (4.1) we now have the 
following expression for the marginal density lx)'- 

Corollary 5.1. Given the eonditions and notation in Theorem 4-1 and 
Proposition 5.1, we may state: 


d+2 


m^^{x\Ir,Ix) = 


i27r)^f{x\T*,9*)7r{e*\X*)\Ir\-^\I x'-^ 
\H{x,T*,e*)\^\'i>xx{0*,X*)\^ 


nFG((T-)3|«-|{l + 0 


U=l 


{1 + 0 ( k )} {1 + .^} {1 + 0 (( j )} . ( 68 ) 


Proof. This is an immediate concequence of Theorem 4.1 and Propo¬ 
sition 5.1. □ 


6. Marginal renormalization 


As pointed out in [Ris98], using estimated values 0*{x), t*{x), X*{0*) 
for given data x instead of true parameter values 0, r, A, does not yield 
an optimal code length for the data. That is, there is redundancy in the 
resulting code [Ris98] , and to remove this redundancy means to renormalize 
the marginal m{x) in order to to get a proper density for use with the 
(IN)MDL Principle. For given data set x and likelihood function f{x\f3i), 
Rissanen defined in [RisOO] the normalized maximum likelihood (NML) 
marginal density m^MLix) by; 


niNMiix) 


(M f {x\P* jx)) 
Cnml 


where (3* is the ML estimator and 


Cnml=[ f{z\l3*{z))dz. 

Jz&Y 


( 69 ) 
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The integration region Y in the case of a gaussian likelihood / was chosen 
through the ML parameter estimators to be the least possible hyperspheres 
containing the data x±_ and While the ML esimator r*(a;_L) for the 
noise naturally imposes a spherical geometry on the part of the data space 
containing the noise, the same cannot be said of the ML estimator (3* for 
the parameters f3, which is simply: /3*(z||) = zy. 

It was shown in [RisOl] that the density rriNMLix) satisfies: 

rriNMiix) = arg inf sxrpEx^g |log 1 (70) 

geOgeG I <l{x) j 

where G is the class of distributions g(x) satisfying Ex^g log {g{x)/ f{x\l3*))< 
oo, Q IS the class of all densities and f3* is the ML-estimate of the parameters 
(3. This means that the code length — logrriNMiix) induced by the density 
q{x) = rriNMiix) minimizes the expected difference between the the code 
lengths —logf{x\(3*{x)) and — logg(a;), where expectation is taken with 
respect to the ’’worst case” data generating distribution g. To compute 
the optimal code length, the domain Y 3 x on which the marginal density 
m{x) is defined, has to be chosen properly, [RisOO]. The expression (75) 
shows that the question for us is then for given data set x to choose the 
region 0* 9 6 * properly. The choice of 0* may be of importance to our 
code length principle. This region should not be chosen too big, neither too 
small. How to accomplish this? In [RisOO], the choice of 0* was taken to 
be the spherical region 

= l^: G : 0 < ||z||2 < ||a;|| H^l. (71) 

This is perhaps the most ’’honest” choice of integration region 0*: In the 
absence of a prior distribution on the parameters 6 the choice of a flat 

prior distribution on a domain with no preferred direction certainly does not 
impose any prior constraints on the parameter 0, except that its expected 
norm is ||®|| || 2 . We will use the geometry imposed on the signal space yy by 
the prior distribution 7r(0|A) through the invariant ML-estimator A* defined 
in (63). That is, for given data set x and model 'jd, we choose 

0* = je G : \*{e) eJx3 A*(r(a;))| (72) 

for some chosen interval Jx C M_|_. This choice will ensure that 0*{x) G 0*. 
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Figure 1. Illustration of different geometries on the region 
0 < J2i=i the cases u = 2 (circle) and u = 

uo < 1 (star) and model size d = 2. 

We have the following result on the marginal normalization 


Proposition 6.1. Under the conditions given in Theorem 4-1 and the 
following additional condition on the number X (u, A, r) defined below: 




where 



n — d + 2 


V i/0 < u < 1 
u/2 if \ < V <2. 


1-C 


(74) 


n-d \ d n-d idC^iy^^n{X*,T*)) 

r» S. r» I 


n — d + 2 


n — d + 2 


+ 0 


' dCy{^n{X*,T*)) 

n — d + 2 


-h(v) 


n 2N 


+ dlogPG ( (r*)^/^ inf \e, 


l<i<d 


+ 2 log (^Ti") + log 


de* dr* 


7r(r|A* 


•■e0*,T*eJ^* |A| \It\ |4';^;^(0*, A*)|2 
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< logC^^ 

< log 


n — d \ d ^ 1 , 

de* dr* 7r(r|A*) 
lAl \It\ A*)| 2 

We note that 


+ log 


/ 

Je* 


(75) 


l<i<d 


n. 


C sup Ci/pIp — 1 | ( ■)^n(A*,r 


d 




* I u —2 


and by (44^) we have in the case of a prior distribution tt\{ 6) flat in 0 that 

r(8'|A*) de* dr* 


+ log 


‘ee*,r*eJ^* |^'aa(0*,A *)|2 \h\ \It\ 


Proof. A proof is given in the appendix. □ 

6.1. Comments on Proposition 6.1. 

(1) We note that using the expression (414) and retracing the steps 
leading up to (75), the inequality in (75) may be sharpened by 
replacing the term by: J2i=i ti\*,,y{T*,6*) 

= C,u\u - 1| (fO(A*,r*))“"/' Eti |(r*)i/20*|-2. 

7. Discriminating between model classes 

We now proceed to find the best choice of model for our estimation 
problem, where best choice means choosing the set of of model indices 7 ^* 
yielding the shortest desciption in terms of code length in a binary alphabet 
of both model and data x when encoded under the modelled data generating 
distibution m^flx)IC^^, that is 

7d* arg infi<rf<„_^^g|o_i|„ {- log {m^flx)/C^fl} . (77) 

To find the optimal model, we must express the total code length L{x, M^, 'jd, d) 
needed to encode the data x for given model class M^, model 7 ^ and model 
size d. In previous work presented in [HYOO], the process of encoding the 
data X, the model index vector 7 ^ and any model hyperparameters a which 
are used in defining the model, was decomposed as follows: 

L{x,-fd,a) L{x\a,-fd) + Li'ldla) + L{a). (78) 

[HYOO] then proceeded to address the question of how to select a suitable 
prior distribution for the model index vector la [HYOO] the 7 rf(i) were 
modelled as IID bernoulli distributed with parameter p, and a procedure for 
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estimating the hyper-parameter p was provided. However, the authors in 
[HYOO] noted that the estimation of the hyper-parameter p is non-trivial, 
and some care had to be taken to avoid too large models. This is an expe¬ 
rience we share from our own numerical experimental work as well: Simply 
using the marginal formulas (49), (68) and optimizing the resulting code 
length for the marginal distribution over the model size d, did in our numer¬ 
ical experiments more often than not lead to a code length expression with 
no minimum for d < n/2 or an optimal model size d so large (comparable to 
n/2) that the stated sufficient conditions under which the asymptotic mar¬ 
ginal expressions (49), (68) are valid, are not satisfied. This suggests to us 
that we have been asking for too much in our use of the MDL principle; The 
extra degree of freedom introduced by the prior distribution 'Kx{6) through 
the parameter A has to be treated with care. However, we do not wish to 
introduce additional (hyper)parameters into our model classes as this 
will raise the problem of providing reasonable models and estimates for these 
parameters, which proved difficult to us: The resulting model selection prin¬ 
ciples and estimators performed poorly in experiments. We will therefore 
adopt a different strategy from that in [HYOO]. We observe that the model 
classes depend on the choice of prior distribution 7r(0|A) and that there 
is (under the conditions and model given here for the data x) no a priori 
reason to believe that all prior distributions are (or should be considered) 
equally likely for the given dataset x. Therefore the code length measure 
induced by the marginal density defined in (29) on the collection A4 
of model classes under consideration should be extended to a code length 
measure that in some way also quantifies our belief in a particular choice of 
prior distribution for given dataset x and model (25). Obviously, we can¬ 
not compare all possible choices of prior distributions. Also, we suspect the 
key to solving our problem described above of model overfitting the data, 
lies in the parameter A which is the only parameter discriminating between 
different models for given data x, prior distribution p, model index vector 
7d, noise level estimate t* and parameter estimates 9*. Therefore we will 
confine ourselves to constructing a measure for comparing our chosen prior 
distribution p\^ with variance A“^ to some chosen reference distribution qx^ 
with variance A~^. The distribution px^ is taken to be the best choice of 
model distribution for the unknown true distribution of 9 that we are able 
to come up with based on our prior knowledge (or our more or less qualified 
guesses) of the data and the data generating process. The distribution qx^ is 
taken to be some kind of canonical reference prior distribution against which 
we will compare our choice pxp- The problem is then to find a reasonable 
way to compare pXp and qx^ ■ For this we will make use of the entropy S (p) of 
a distribution p, that is the expected (mean) code length for encoding data 
using p. Let Vn denote the collection of probability distributions defined on 
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\ then the entropy S : V„ 


is dehned as: 


S{p) '^=1^ Eg {- logpie)} = - / p{e) logpie) de. (79) 


Now, we define 


del Ajj 


a = 


(80) 


and 


exp (5 (paJ - S{qx^)) 
faGi^ - '^(^A,)) da' 


(81) 


We will call D{px^, qx^) a model class prior distribution. We note that if the 
distributions p, q live on the same parameter manifold, it is obvious that 
S{pxp) — S{qxq) is parameterized by a. If p and q live on different parameter 
manifolds, we may still parameterize S{pxp) — S{qxq) by a = Xp/Xq if we 
ensure that p and q are normalized w.r.t an integration measure which is 
invariant to reparameterizations e.g the Fisher information measure. Some 
care will have to be taken in the choice of normalization interval la in (81). 
This question will be further adressed below. The density D{p,q) defined 
in (81) may then be used to measure our prior belief in the distribution p 
relative to the reference distribution q. 

We proceed to compute S{p) for the distributions of interest to us here, 
that is the GGD distribution and Jeffreys prior. In the case of a GGD 
distribution, we have 


S{p\)=^-[ px{0)'^OgpxiO) de 


a trivial computation yields 


Proceeding with Jeffreys prior distribution 

|F(0,r)|V2 


L 


re(o,T*),|!0||2<R 


|F(6»,r)|V2rf6» 


dr 


(83) 


<iR{e,T) 


(84) 
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where F is the Fisher matrix of the likelihood distribution. We note that 
the integration measure T~^dT in (84) is needed to make the normalization 
of qR invariant to reparameterizations. Using the gaussian distribution for 
the likelihood, a trivial computation yields 


\F[e,T)\y^ = 


and some calculation then yields 



||0||2<i?,Te(o 

nd-2 
2 


pog(§) 


r(f + i) 


+1 - 


(85) 


using Stirling approximation on the Gamma function yields 


S{qR) = +log(^ 


d — 2\ d / 27reR\ 

+ ‘ + 2'“*(dT2j 


i log (t) 


+ log 


TT 


(d + 2) 


- 1/2 


+ 0(d 


-D 


( 86 ) 


We may now compute the measure D{p, q) for the different p and q of interest 
to us in the current context. First, we consider the case where the reference 
distribution q is taken to be Jeffreys prior distribution. This choice of refer¬ 
ence distribution may be interpreted as a very pessimistic one, in that this 
choice of a distribution flat in 6 states our complete lack of prior knowledge 
of the noiseless data 6, or rather our denial of imposing a more informative 
prior distribution on 6, that is a distribution with less entropy as reflected in 
(86) where we see that the entropy of the distribution q^ is up to an additive 
constant very close to the maximum entropy dlog V2TTeX~^ [CT91] attained 
by a d-variate Gaussian distribution with variance = i?^/(d-|-2). Choos¬ 
ing a GGD distribution as the candidate for the true prior distribution p\^ 
and setting 


\ - 


1 „2 def Ap cfef . 

XL 5 Of , Iq, 


d + 2 


Xn 


expression (81) then becomes 

-p(‘ll“8(“^)-ilog(AAN)) 


D{p\^,q\) = 




a 


-d/2 


r“^ a da 

Jao 


d - 2 / ao \ 2 
a 


a, 


-1 


1 _ ( so 

Ol 




(87) 
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We assume that d ^ 1, and/or ai ao- We conclude that D{pxp, qx^) 
for all practical purposes only depends on the lower bound ao, and this 
dependence is very strong. Therefore, ao has to be chosen carefully. If 
we had a discretization Aa > 0 of the parameter a, this would suggest a 
lower bound on our choice of ao; namely ao A Aa. In lack of any prior 
information of how to choose ao, we settle for the most conservative choice 
ao = Aa as this choice of ao will clearly make the code length contribution 
— log D{p, q) largest possible. By definition (80) we may deduce the following 
connection between discretizations AAp, AA^ on parameters Ap, A^ and the 
discretization Aa, respectively: 




( 88 ) 


Using the result shown in Proposition 11.1 on the posterior coarsest dis¬ 
cretization of parameter A, together with definitions (61)-(62) we get 


AAp = 


x/ i^p{d + 2) 


A* for p GGD distribution in (82) (89) 


AA, 




a/2 


x/dT2 




for q Jeffreys distribution in (84) 


yielding 


(90) 


Aa 


CxM 


1 I A q l^p 

cT 2 

Ap 

\ v'p{d + 2) 


(91) 


where 


a* and 0 <C'a,,C'a, <1. (92) 

\ 

Setting Cxp = Ga^, this leads to 


D{p\p,q\,) 


/WA^ 1 4GA^a* / 1 + ^ A 

Vay I d — 2 \i2p{d + 2)) 


-1 


/ 4Clil + !±) Y 

y Vpid + 2) J 


(93) 
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in the case where the true distribution p is taken to be the GGD distribution 
in (82) and the reference distribution q is taken to be Jeffreys distribution 
in (84). In the case where both p and q are taken to be GGD distributions 
with shape parameters Vp and t'g respectively, using (89) we get 


D{p\p,q\^) 


a J \ d-2 yup{d + 2) j 



d 

4 


(94) 


We note that in practice (93) and (94) will be evaluated by plugging in the 
estimate a* defined in (92) for a, and so we conclude that to leading order 
the contribution from the model class prior distribution D{p,q) when the 
true prior distribution p and reference distribution q both are taken to be 
GGD, will be; 

A (l + ^)\ 

(95) 

and when the reference distribution q is taken to be Jeffreys distribution, 
Vq is replaced hy Vq = 2 in (95) above. That is, the model class prior dis¬ 
tribution D{p, q) does not discriminate between q a gaussian or g a Jeffreys 
prior distribution when the likelihood for the data is gaussian. We note that 
the number 0 < Caj, < 1 is connected to an estimate of an upper bound 
on the relative error of the posterior density through the relation (153) in 
Proposition 11.1, and we see from (95) that the code length contribution 
from — \ogD{p,q) will contain an additive term — ^logCA^. We then end 

up with the following process for encoding the data x for given model class 
iv) 

, prior distribution p, reference prior distribution q, model size d and 
model index vector yd- 

L (x,Mjf\yd,dJ = L (x\Mjf\yd,dJ + L |, 7 ^, L{yd\d) + L{d) 

= - log 2 {m^^{x)/C^^) - log 2 D{p, q) -h L{yd\d) + L{d). (96) 

If we have no prior information on the optimal index vector yd and model 
size d, then L{yd\d) = log 2 2"^ and L{d) = log 2 (n) are constants, we will 
adopt this view here. The optimal model 7 )). is then found by computing 

Id* arg info<rf<„,^_^g{o,i}«T(a:, M^^\yd, d). (97) 


7.1. Comments on the term D(p,q). 

(1) Using the prior distribution D{p, q) on the model classes and choos¬ 
ing both the prior distribution p and the reference distribution q 
to be Jeffreys distributions with Xp independent of Xq, we will end 

. /4C2 \ 

up with — log D{p,q) ss — | log ( j to leading order, as can be 


verified by following the same steps as we did above for the case p a 
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GGD distribution and q a Jeffreys distribution. Since our numeri¬ 
cal simulations reported in the experiments section below show that 
the NML code length principle of Rissanen [RisOO] does not work 
well at all on datasets with signal to noise ratios below some level, 
whereas our INMDL principle in our simulations is seen to work as 
well as or better than NML-principle over broad regimes of SNR, 
it is tempting to suggest that when applied to denoising problems, 
the NML code length should be modified by adding an extra term 
. fiCl \ 

of — I log ( yiqif- 1 • We think that more work in this area is needed 

as our reported numerical experiments seem to indicate that our 
suggested code length term of — log D{p,q) does not yield optimal 
model sizes, particularly not for the very small and the very high 
SNR values. 

(2) We emphazise that in the case where we choose p = q for prior 
distribution p and reference distribution q, we have a = 1 and 
D{p,q) becomes a constant. 

8. Model selection by the INMDL Principle 

We must address in detail the question of how to actually find the opti¬ 
mal set of model indices when given data x. Equation (97) tells us 

Id* =^arg info<rf<„^^^g{o,i}nL(®,Mj''\7rf,d). (98) 

where L{x, is defined in (96). Using formula (68) together with 

(96),(97) we see we have to solve 

7d* = arg 

= arg info<rf<n, 7 rfe{o,i}" {HM^^^\'yd,d) 

-log [/(a;|r*,0*)]^ - log [7r(0*|A*)]^ - log(27r)^ 

d 

+ log [|Lf(a;,r*,r)|l]^-J]logPG((r*)^|0:i{l + o(C)}5) 

i=l 

+ log[|^AA(0*,A*)|5j +logC7,+log[|/,|-I/ aILI. (99) 

Inserting the expressions (413), (416) yields the expression 

L{x,M^^\d,d) = -logD{Tr,q) +logC^^ + -—- log [7r(0*|A*)]^ 

3, , . 1, n — d+1^ fn — d + 2\ d , 

- - log (2g) - - log 2-^-log j + 20(0 

+ log [l-fri • |/a||„ + - -^7^ log [IIXl III + ||X|| - e'lll] 


V 
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+ log 


1 


a 

|'I'aa( 0 *, - J^logPc ((r*) 5 | 0 ;| {1 + o(C)}") 


2=1 


+ - log 1 - 


1 + o(C) 


1 + 


l*±lli 


1*11 


Using the result in Proposition 6.1 we may write 


( 100 ) 


( 101 ) 


where 




log / 

j0*e0*,r*eJ^ 


log - - log (27r) - - log 2 
7r(0*|A*) de* dr* 

. \^xx{o*,x*)\y^ M W\ 


n — d + 2 ^ 

+ -2- 



*11 -^111 

ll*±lli 


^logPG((r*)^|0ri{l + o(C)}^) 


+ 2 log 



2 

l + o(C) 



d 

+ 2»(0 


( 102 ) 


and 


Q (x,M^'"\jd,d^ -logD{7r,q) + -—+ log [|/^||/a|] 


n — d+1 


log 


Ti d “t“ 2^ Ti d~\~‘2 r.. ..o-i 

+ -^-log[||*±|| 2 ], 


271 


- log 


7r(r|A* 


7r(r|A* 


d9* dr* 

~ Ji7\ 


\'i/xx{0*,X*)\2 / Je*e0*,T*eJr* \'i/xx{d*,X*)\2 lU 
We have the following result; 


(103) 


Proposition 8.1. Assume the conditions stated in Theorem 4-1 and let 
q denote the chosen reference prior distribution. Then, for given data x 
and prior distribution tt, the optimal model class and model is 

up to a code length precision of size Ad{x), selected as follows: Let Sd C 

An '= {1,2,3, ...,n — l,n} be of size d < n and define jdii) = ^ if i ^ Sd 

and 7 d(i) = 0 otherwise. The sets Sj (0 U Sj-i, where I G An \ Sj-i, 
are computed iteratively by minimizing the criterion C'(a;|| (j)|5j_i) for each 
index j : 1 < j < d < n over the set of indices I G An \ Sj-i by putting 
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x\\{j) '^= x{l) and defining 
C(®ll(j)|S’,_i) =^-(n - d + 2) ,,^ 


X n - Xu 


d 


d\xiiij)\ 




TT{e*\x* 


TT{e*\x* 


dO* dr* 


\^!x\{e*,X*)\2 / ^•e0*,r*ej; |^'aa(0*,A*)|2 I-^a| l^r 


(104) 


def 


where xp 0* and X* are given by the model defined by the set Sj =S'j_iU{^}. 

r de* dr* 7r(0*|A*) 

le*£e*,T*£j^* |/a| \It\ |4'aa(^*,A*)|5 


Ad{x) =hogC^^ - log 


+ 


n — d + 2 


log 1 + 


1*11 -^'111 


+ - log ( 1 - 


l*±lli 


1 + 


+ ^o(C) - ^log(27r) - ^ log 2 


l*±lli 


*11 -^*\\2 


l + o(C) ^ 

^logPG((r*)5|0*|{l + o(C)}^ 


i=l 


(105) 


and the value of d* is determined by minimizing the code length expression 

n — d + 2 


Q (x, , 7 d, d'^ = - log D(7r, q) + 


+ log[|/^||/A 


n — d+\ 


log 


n — d + 2\ n — d + 2 


271 


log [ll*±lli], 


- log 


7r(r|A* 


7r(r|A* 


de* dr* 


\^\\{e*,X*)\^ / i0*e0*,r*eJ,- |4 'aa(0*,A*)|5 \h\ \It 


(106) 


with respect to d, and D{TT,q) is given by (93) or (94) and the total code 
length expression L ^x, d^ is given by 

L (x,Mj^^\jd,dJ = Q +^d{x). 


Proof. Under the given conditions the optimality of the selection al¬ 
gorithm defined by minimizing the criterion C'(-) in (104) follows by dif¬ 
ferentiating the expression (103). Then observe that the total code length 
expression is given by (101). The result follows by recognizing that = 




□ 
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8.1. Comments on Proposition 8.1. 

(1) In the case we will be concentrating on: tta is a GGD and A* is the 
ML estimator the criterion C'(-) in (104) becomes 


C(*ll(i)|5,_i) ^^^-{n-d + 


+ 


d-2 


m 


i0*i^5i®ii(j)r 

Thus, the model selection process in this case may be implemented 
by a quicksort procedure. 

(2) Using the bounds in (75), (76) on logC.^^ we may easily compute 
bounds on A^. 

(3) We note that if —ij^ < 1, we have (by using the Taylor expan- 

sions centered in y = 0 of log(l ± y), 0 < y < 1) 

1 n .n - d\\xii - e*\\l d 3 1 

log -- o c - K log ( 2 ^) - o log 2 


- log 

< Arf 


L 


|a;±|l2 2 

7r(0*|A* 


2 

de* dr* 


e0*,T*eU* |Taa(0UA*)|^ I^a| \It 


/I ^ , n-d+14||a;|| d 

<iogG, + ^-+ T 

-dlogPc inf \9*\) - ^ log (27r) - ^ log 2 

\ i&'fd / 2 2 


- log 


7r(r|A* 


de* dr* 


‘e0*,T*eU* |Taa(0*, A*)|2 LaI \It\ 


(107) 


9. The INMDL- versus NML-principle for gaussian likelihood 


It may be of interest to know how the code length-principle we have 
developed in the previous sections defers from the code length principle de¬ 
veloped by Rissanen in [Ris98] and [Ris96] in the special case of a gaussian 
likelihood function. Starting out from the expression in (49), recalling the 
initial definitions of prior distributions on the parameters in (26) and (27) 
we define the joint prior distribution TTx{e)(;{T) as follows 




del 


\FiO,T] 


fe&e 


0£O*,t£I-, 


|F(6>,r)|2 de dr 


(108) 


Now, it is easy to verify that this choice of Jeffreys prior in (108) as a joint 
prior distribution satisfies the conditions on the prior '7rA(0) stated in Theo¬ 
rem 4.1 when the likelihood function / is gaussian. However, the Theorem 
4.1 was deduced under a flat prior distribution on the parameter r. But 
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by the proof of Theorem 4.1 we see that the special choice of joint prior 
distribution ?(r)7rA(0) in (108) transforms under the chosen reparameteri- 
zations given in the proof of Theorem 4.1 to a constant and therefore does 
only contribute as a constant in any of the integrals discussed in the proof 
of Theorem 4.1. Because of this fact, the formula (49) is still valid if we re¬ 
place 7rA(0*)|/T|~^ in (49) by the expression for 7rx{0*)<^{T*) given in (108). 
We observe that the prior distribution in (108) in the case of a gaussian 

likelihood is smooth in 0 and therefore the term nti PG{ir*)h*) in (49) 

is to be replaced by 1. By Corollary 5.1 the code length defined in (96) now 
becomes 


L = -logD{TT,q) +logC^^ - log[f{x\e* 


d+l 1 

- log (27r) 2 - log 




+ log 


'esevrs/r 


\F{e,T)\^ de dr . 


(109) 


Now, by the fact that |'I'aa(0*, = \/{d + 2)/2/A* is a constant (in¬ 

dependent of 6* with 1/A* = (d -|- 2)“^ lla^li III) and may therefore be taken 
out of the integral in (442) we get 

log <1^74 = log \ ~ (|^aa(0*, A*)|^/2|/a|) • 

( 110 ) 


Furthermore, in this case we have that tt = q = Jeffreys distribution and so 
the parameter a defined in (80) is identically 1, i.e deterministic, and there¬ 
fore the discretization Aa = 0. By (81) we get D{p, q) = exp(O)/ exp(O) da 
which is a constant, and may therefore be omitted from the code length 
expression. Letting L' {x,^^) denote the NML code length as developed in 
[Ris96] we have 

L' {x, -id) = - log [/(®|/J*)]^ + log 

+ \og([ |I(/3)|5 d/j) +0(1) (111) 

Vd/3ee*x+ / 


where 


/3S(r,e+6R« 

( 112 ) 


and thus (111) may be rewritten as 


d+l 


L\x,-yd) = -log[/(a;|0*,r*)]„ -log(27r) 2 
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+ log 


'0ee*,rei-r 


|F( 0 ,r )|2 de dr +o(l) 


(113) 


By (109), (110) and (113) and observing that H{x,t*,0*)= we 

conclude 

L - L' {x,jd) = log constants + o(l). 

(114) 

We note that the difference L (^x, — L' (®,7d)= ^d{x) + o(l), 

where Ad{x) is the code length precision in our INMDL model selection 
principle as given in Proposition 8.1. Thus, in model selection, we may ex¬ 
pect our INMDL principle to yield results very close to the NML-principle 
of Rissanen [Ris96], [RisOO] in the case of a gaussian likelihood function 
f(x\d,T) and Jeffreys prior (108) as a joint prior distribution We 

summarize our findings; 


Corollary 9.1. Assume the conditions given in Theorem 4-1 and Propo¬ 
sition 6.1 and Proposition 8.1. Let L denote the code length 

as presented in Proposition 8.1 and let V (*, 7 ^;) denote the NML code length 
as developed in [Ris96]. Then 

L (^x,Mji'\jd,d^ - L' (®, 7 rf) = log ^ ^^^^ 2 ) constants +o(l). 

(115) 

Proof. See discussion above. □ 


10. The posterior mean of parameters 

Given the model 7 ^ we want to compute the posterior means 0^ and 
of the parameters 6 and r, that is 


0 “ Ee,r {0} (116) 

(117) 


The posterior p-/^{0,t\x) is defined by Bayes rule: 

P7d(D 01®) -3^/(®|r, 9)ttx{9). 


( 118 ) 
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Let (r, 0^)^ = T((r,0^)^) denote the reparameterization induced by the 
mappings 4> : {f, 9) ^ 6 and 'll) ■. t ^ t given in (39) and (43). Let 
J-f be the Jacobian of T as computed in (294), let T-f be the 3-tensor of 

second derivatives of T and for notational simplicity define (3 (r, 0^)^, 

(3 (f, . Then we may write to leading order 

p = r{0*) + JriPliP - 01 + T{01{0 - 01(0 - 01^ (119) 


and thus 

p^=p* + Jr{01 

+ Tr{01 / , 

.//3eTxi 




i0-P* 


exp 


4(®,/3)) 


m^lx) 


|F(/3)|V2 d0 


*\T 


(/3-/3*)(/3-r) 


exp 


-^(®,/3)) 


m^lx) 


|F(/3)|V2 d0. 


( 120 ) 


Proceeding as we did in the proof of Theorem 4.1 by claiming 0* G 
and splitting up the integrals in (120) into integration over the two disjoint 
domains Ml in the 0-variable, the following result is a straightforward 
consequence of the proof of Theorem 4.1 together with the observation (con¬ 
sider J-f in (294)) that the integrated contributions from the Ff-part of the 
expansion for this particular T does not contribute to leading order and may 
be neglected: 

Corollary 10.1. (Corollary of the proof of Theorem f.l). The poste¬ 
rior bias of the estimators 6* and t* under the conditions in Theorem 4-1 
may be 'written on the form: 


= 0 ; + ( r *)-2 


_i (27r) 2 — l)(p — 2) 


6 




1 


exp 


X < 


sgn 


m+ E 


u-l 


sgn(e]) 


exp(iT*(9*)2) 


+0 E 


n — d 


C=i 

Ee,T{T} = r*{l-F 
-1-0 (p(p — l){y — 2) 

I z/—2 


*/o*\2 


exp i --t*{9 



1 < i < d. 


( 121 ) 


EU I ^r ' { (9* ) + (r*) i 9* exp (- (9*0^') } \ 


(n-d) {^n{X,Tiy 


( 122 ) 
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Proof. This result follows directly from the proof of Theorem 4.1, by 
equations (336) through (364) and by inspection of the Jacobian J-f which 
is evaluated in (294). □ 


11. Discretization of model parameters 

We will in this section investigate how the INMDL-principle may be applied 
to dednce an upper bound on the discretization on the model parameters. 
We will use the result both to compute a sufficient mesh size on the grid 
on which we solve the nonlinear eqnation which determines the estimator 
9* in the experiments section below, and in our deduction of a model class 
prior distribntion. In the previons sections, by means of Theorem 4.1 and 
its proof and Corollary 5.1, we have established the following formnla for 
the posterior density as 


Pid [^,r,X\x) = 






|F(0Tr*)|V2 

exp (-^{x,$ 


p{e\x)f (x\ci){e,f),ip{f) 


m^^{x) 

\Fie*,f*)\-2 

m^^{x) 

X exp - $*fH{x, 0*)0 - 0*) - h-^^^{e*,X*){X - X* f 


\F{e*,T*)\^/^ 

\Ir\ ■ \h\ 


(2^)(d+2)/2(^)|j^|-l|j^|-l 

I//(a;, , r*) IV21 (6»*, A*) IV2 


-1 -1 


7r(r|A*)/(a;|6>*,r*) Ufo({r-)iK 


2=1 


X exp ( -^x, e*,f*) ) exp ( --(/3 - 0*)'^ H{x, $*){0 - 0* 


1 


* \2 


xexp(--T^^(r,W)(A-W) 


= \H[x, e-, ) 11 / 2 1T I j,.. (9-. 


. d+2 


7r{0*\X*)f{x\0*,T* 
1 


X (271)-— exp { -4>(a^, 0*,r) - ^(/3 - /3*y H(x, /3*}(/3 - /3*) 


1 


* \2 


xexp{--^^y0*,X*)(X-X*) 


n^G((r*)^ir 


U=l 


-1 
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-1 -1 


nPG((T-)ii9- 


. 2=1 




X (27r)-^ exp ( -i(/3 - (3*)i(3 - (3*) 


1 


xexp 


2 


( 123 ) 


where $ (f, O'^Y' with 0, f, 0*, f* defined in (39), (37), respectively, and 

A, A* defined in Proposition 5.1. The ~-relation between the lefthand side 
and righthandside in (123) is due to our omitting terms of order three and 
higher in the Taylor expansion of and T used in formula (123). These 
terms may be found by going through the proof of Theorem 4.1 and Propo¬ 
sition 5.1, but assuming the conditions in Theorem 4.1 under which the 
marginal approximation is valid, they may be omitted here. Now, since we 
do not know the exact form of the prior distribution 7r(0|A), we do not know 
the exact forms of T(0*, A*), 'i'(0*. A*) and the map y : A ^ A. But we do 
know that the transformed Fisher ’’matrix” E[\) is constant and we may 
expect that the transformed Hessian 'kj^j^(0,A) of — log ^7r(0|A)|.E(A)|“^/^^ 
satisfies 

^^ E{\*) = \ = constant. (124) 


By means of (328)-(330) we evaluate the Hessian H to be 



i “ 

bi 

b2 

••• bd\ 


bx 

Cl 

0 

... 0 

Hix,e*,T*) = 

b2 

0 

C2 

... 0 


\ bd 

0 

0 

••• Cd J 


where 


(125) 


« = el 


n — d + 2 1 


n 


2n 


r*)- 2 e*g + l^r*{9*foYxAr*,0*))+ 

71 < ^ ’ 


2=1 


” i=\ 
bi = -ed 


(126) 


2 A 2 _1 
n 


T2 ( (r*)2(a;||(i) - l0*)+ 


+ l<i<d 

Ci = f{l +oAxAt*^*))), l<i<d. 


(127) 

(128) 
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Using (125)-(128) and the determinant formula (291) we may write 


\Hix,e*,n\={a-^i]flc, 


j=i 1 / i=i 


r'^exp 




3 * 11/—2 


2=1 


n-d + 2^ 1 j 2 ^( 5 ?hr-)l 9 r)' 

+ ^ll(r-).r|il -4-E- 


n 


i=l 


2-d 

= exp 


f‘‘ = \F(e\f-)\, 


o 0Si(A-,r-U= J^|(r-)i« 


\-n*\i'—2 

H I 


2=1 


(129) 


where ~ means asymptotical equality as n ^ oo and f ^ 0. We want to 
determine the half-axes of the reduced quadratic form associated with H. 
We proceed with estimating the eigenvalues of H. Letting Id denote the 
identity matrix we have 


det 


(kid - 


det 


/ k — a 
-bi 
-62 


-61 -62 

k — Cl 0 
0 k — C2 


-hd \ 
0 
0 


= \ K — a — 



k-Cd 


J 


(130) 


where we used the determinant formula (291). The first eigenvalue k = ki 
may be found by solving 


Ai — a 


E 

i=l 


Kl - Ci 


= ki — a — 


hi 


-r(l + o(C)) 


= 0 . 


We introduce the approximation Ci ^ t, Vi founded on the asssumption 
C 1 and get the equation 


d 

(ki -a){ki -f) = ^ 6 - 
2=1 
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which yields 


M = ^ 1 ± l + 

2 \ (a + ry 


Using (126), (127) yields 


+ fl' Ejf (-i + iEtio(w,.(r-,9r)K(e*)") 

1± 1 + 4-1^ 

^ \ (ei(l + (2n)-i||(r*)50 *||^)+t) 


Now, using: 

n-^\\{T*)h*\\l<x^ = n{x,T*) 

U-T 

T 

we may write (131) as 


p. _ “ + ( 1 _L I 1 I ,-2;^ -1 + ^2(A*,r*)o(C) 

K\ — - I 1 dz I i “h 6 u T- TT 

2 I V (l + iO(AUr*) + f)^ 


Assuming U(A,t*) ^ 1 or ^ <C 1 or alternatively ^ 1 we may write 

Al«.^(l±l). 

Observing that k = 0 does not solve (130) we finally get 


Ki ^ a + T = a(l + a ^r) a 1 H-- 

V ^^ + ( 2 n)-i||(r*) 20*||2 


a I 1 + 


1 + iU(A*,r*) 


where the ~ here is taken to mean ’’almost equality” if n is sufficiently large, 
d/n sufficiently small, f <C Q{X*,t*) and we use (A*)“^ ~ (i“^||0*||2. Now, 
we estimate the rest of the eigenvalues ki, 2 < i < d+1- By (129) and (133) 
we have that 

e\f^ ^ \H\ = Ylki ^ aYlki ^ el (- -+ -f7(A*,r*)) ki. 
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Now, because Ci ^ f, V i, we conclude that Ci ^ Cj, 1 < i, j < d and from 
(130) we may then conclude that ki ^ kj, 2 < i,j < d+1- Then by (134) 
we may write 

-— ^ + -n(A*,r*)^ ^f,2<i<d+l, (135) 

TX Zi J 



where in the last ~ above we made the assumption that 0 < logH <C d. 
Letting M denote the orthogonal matrix such that M^HM = diag(Kj) 
and define the orthogonal transformation of variables a Plugging 

this change of variables into (123) we see that cx r\j AA(M^/§*,diag(/fi)). 
Now comparing the eigenvalues {ki}^^l we have estimated above with the 
elements of H, we conclude that M^HM = diag{ki}'^^l - diag(iT) 
up to our accuarcy of estimation of the ki above. It follows that the half 
axes of the reduced quadratic form (hyper-ellipsoid) associated with H are 
approximately given by • Expanding the model parameters A, r and 

di, 1 < i < d into their differentials AA, At, A9i we get 


AA 

At 


X'(A)AA 
d'4){T 


df 


At = 


d 

^ To exp 



f At 



(136) 

i/)(f)Af (137) 


= Mill A«. + 


dOi 


dr 


1 1 ^ li 1 ^ /2\2 

= T2^/) 2(f)A6'i - -T2^/) ^ ~ j 


(138) 


Now, denoting the discretization size of 9i by A0j, and the discretization 
sizes of f, A by Af and AA, respectively, we may write the relative uncer¬ 
tainty Ap/p of the posterior density p{6^f, X\x) due to the discretizations 
AA, Af, A9i of parameters X,f,9i, 1 < i < d respectively, as follows: 


Ap 

p 


ldp(^e,X,f\x^ 

P dX 



2 


+ 


ldp(^6,X,T\x^ ^ \ 
p dr j 


2 


+ 


+E 


idp(^e,x,f\x'^ 

P d9i 



21 1/2 


( 139 ) 


We want to bound the relative error defined in (139) over the cell (7^* 
defined by 

X [^2 - A02, ei + A02] X • • • 
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X [91 - AOd, e*d + Add] X [f* - At,t* + Af] x [A* - AA, A* + AA], (140) 


When we consider (123) together with (139) we see that AA should not scale 
coarser than 


AA 




- 1/2 




(141) 


where 0 < < 1 is some constant. Considering (123) together with (125), 

(139) and the argument of approximating posterior covariances above which 
justifies treating H{x,6 *,t*) as diagonal matrix , we see that Af should 
scale no coarser than 


Af ~ a 2 . 


_i f n — d + 2 1 




n 


Ee 


_i f n — d + 2 1 


+ -ll(A*,r*)| -c^ 

n 2 


= eE^V2n-^X*,T*)(l + 2 


n-d+2\ 2 
nid{X*,T*) 


V2efn-HX*,T*)-Cr. 


(142) 


where 0 < Cf < 1 is some constant number. We have omitted terms of 
non-leading order in (126) and we assumed n(A*,r*) ;g> 1 when writing the 
last ~ above. Now, choosing the scaling on A9 as 


_ 1 

- ~ r 2 

A6i ~ -sgn {9i) -^ • cs 


Vd 

where 0 < Cg < 1 is some constant number (see (123)), and the sign conven¬ 
tion sgn {A9i) = —sgn {9i) is just a trick to make \ A9i\ symmetric w.r.t sign 
of see (138). The relations (139)-(143) now yields 


Ap(0,f,A|a;) 
sup -^^- 


< 14 + 4 + 


^ 2 X2' 




1/2 


= i/c| + 4 + c| 


Plugging (141), (142), (143) into (136), (137) and (138) we get 

AT~r^fl-4A*,r*)-Cf 

\Jn 

1 1 \ 

+ ^{X\t*)-cA 


AOi - -sgn [Oi) 


AA~x'(A) Tx4(0*,r) 


1 - 1/2 


(144) 

(145) 

(146) 

(147) 
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We note that (147) may be simplified by observing that 

= x"mx{e,X)+x'C>^f^xx{0,X) (148) 

oX^ 

and noting that 'I';^(0*, A*) = 0 by definition of A* we may by means of (148) 
write (147) as 


1 - 1/2 


AA~x'(A*) x'(A)2ih;,;,(r,A*) -m =|^aa(0*,A*)|-i/2.cc. (149) 


The expressions (145), (146), (147) may be used to deduce an upper bound 
on the discretization to use in encoding the estimated parameters r*, 0*, X* 
while yielding the posterior distribution to within a prescribed precision. We 
note that in [Ris98] it is shown that the MDL-optimal choice of discretiza¬ 
tion of parameters scales like (asymptotically in n). This should not 

be confused with the discretization given in (143): We want a discretization 
which is fine enough to enable us to evaluate posterior probabilities to within 
some specified precision whereas Rissanen want a discretization yielding the 
shortest code length [Ris98], [Ris96]. 


Proposition 11.1. The discretization Ar, AA, A^* on the parameters 
T, X, 6i, 1 < i < d, respectively, given by 


At = T—=0, 2 (A*,r* 


n 


_ 1 

A9i = - — ■ Cg ■ I 1 -|- 


\/d 


AA = x'(A) i!xx{e\X*) 


0 < < 1. 


Cf, 0 < Cf 

< 1. 


(150) 

d r p 

Y'A) 

, 0 < Cg < 1. 

(151) 

n n(A*, r*) 



1/2 

•c^ = |TAA(r,A*)ri/2.c^, 

(152) 


yields the following precision ApiO, f, A|a;) on the posterior density p^^{6 , f, A|a:) 


Ap{0,f,X\x) 

p{e,f,X\x) 


C? g2 _i_ 

-e c- -p C-. 


(153) 


Proof. See discussion above. □ 

11.1. Comments on Proposition 11.1. 

(1) The discretization scheme given above should not be confused with 
the optimal l/\/n discretization given in [Ris98], which is optimal 
in the sense of minimizing the expected difference w.r.t the worst 
data generating distribution g between code lengths using the code 
length induced by any distribution q{x) on data and the code length 
induced by f {x; (3* {x)) , see [RisOl] and (70). The discretization 
shown in Proposition 11.1 was developed to be the coarsest possible 
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yielding the posterior distribution to within a prescribed precision. 


(2) We see that the discretization of Oi given above is data driven and 
implying a discretization that may well be finer or coarser than the 
MDL-optimal discretization of [Ris98]. It will generally 

lead to a finer discretization if Cq < \Jd/n and a coarser discretiza¬ 
tion if Cg > ydjn. Also, we get coarser discretization for those 

indices i where | > V^- 

12. A formal approximative generalization to non-ganssian 

models 


The results we have obtained so far were deduced for models with 
IID gaussian likelihood distributions. However, it is possible to general¬ 
ize the results to the case of non-gaussian IID likelihood models under some 
(smoothness) conditions on the distribution. The argument goes as follows: 
Given a IID non-gaussian likelihood f{x\9,a) = f{xi\6i,OL), where 

Exi[xi] = and a = (ai, ...,as) are parameters of the distribution /, com- 

def 

pute the Taylor expansion of Qol{Q\x) = — log f{x\9, a) about 9 = 6q = x\ 

n n 

Tq^{0\x) = '^Qoc{xi\xi) + '^a{xi\a){9i - Xi) 
i=l i=l 

1 

+ 2 X ] + Rol{0\x) 

i=l 


where 


Ra{0\x) ^ ^ / c{zi\cx){9i - Zif dzi, a(xi|Q;) =* 




def dQci{9\Xi 


6 


i=l 


d9 


, / I X def d‘^Qoc{9\Xi) 
b{xi\a) = 


d9^ 


, I ^ def d’^Qoc(9\Xi) 
, c{xi\cx) = 


Oi — Xi 


89^ 


9=Xi 


9=Xi 

(154) 


Truncating the expansion Tq^{0\x) to second order in 6 will yield an ap¬ 
proximation g{x\0,OL), which is a gaussian function of 6, to the likelihood 
model f[x\6, a) and we may write 


f{x\e, a) = g{x\e, a)Za{0\x) 


where 


g{x\e,a) = exp -"^Qaixilxi) + ^ 


i=l 


a{xi\a)‘^ 
^ 2b{xi\ot) 


exp [ ^ b{xi\a) (9i- Xi + 


i=l 


a{xi\oi) 

b{xi\a) 


(155) 
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and 

Za{6\x) = exp(-i?Q,(0|a;)) . (156) 


Although the second order approximation g(x\9,a) in general will be a 
poor pointwise approximation to the density f{x\0,Oi), it may locally in a 
vicinity old = xhe sufficiently accurate to be used to compute the marginal 
integral f /(xjd, a)7rx(9)p(A)((cx) dd dot dX to within the desired accuracy. 
An analysis of the remainder term Zai6\x) will have to be carried out for 
the given likelihood / to decide if this is the case. If so, we may define an 
approximative Fisher matrix F to the likelihood f{x\0,ct) by 


F),(/3) -E, 
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dfdidp. 


logg{x\(3) 


where /3 (a^ ^ Q 


3T\T 


(157) 


Then we may proceed similar to the steps taken in (39)-(43) to find the 
reparameterizations 4> : 6 ^ 6, tp : a ^ ot which makes the reparameterized 
Fisher information |i^(0,Q!)| a constant. In at least some cases of interest 
the reparameterizations defined in (39)-(43) should still apply with minor 
modifications and so would the (proof of) result in Theorem 4.1. 




CHAPTER 3 


Applying the INMDL-principle to GGD-modelled 

data 


1. Preliminaries 


We will investigate the performance of the INMDL-principle as devel¬ 
oped in previous sections when applied to GGD-modelled data. The GGD- 
model is frequently used when representing natural images in wavelet bases 
[ML99]. Having found the invariant noise estimator t* in (413), we need to 
compute the invariant estimator 9* defined in (37) under the GGD-model. 
The GGD family of distributions is a two-parameter family governed by the 
variance-parameter 4 > 0 and a shape parameter i/ > 0 and has the form 
[ML99] 




r]{u)X2\9\ 


where 




def 


^(lA); 


(158) 


Under the assumption of HD additive white gaussian noise (WGN) the prob¬ 
lem to solve is 


9* = arg min[ 0 ]^gR - 9 f - logTTA.i 
= argmin[ 0 ]^gR|^(x- 0 ) 2 + r]{v)\^\9\ | 


= arg minrei gR [x - 9]l + 2r]{i^y 




■I [0k 


(159) 


We will in the following consider the case 0 < u < 2. We define 


A{u,X,T) = 2r^{uy 
9 A2^ {u, X, t)9 

def . -J— ( . X _ 

X = (z/, A, t)x. 



(160) 
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(161) 

(162) 
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We see that the problem to solve may be written 

9* = arg min[ 0 ]^gR {[x - e]l + |[0]„r} (163) 

The equation (163) may be solved numerically by means of standard numer¬ 
ical software or simply by linear interpolation as follows. Define 

R{e) = [x-e\l + \[eu'' (i64) 


assuming 0 7 ^ 0 we may then write 

^^ = -2[x-0], + p-sgn([0y|[0],r-\ [4/0. (165) 

ad 

We observe by (164) that R{6) is a convex function of 9 for 1 < u < 2 and 
therefore 9* is given by = 0. It was shown in [ML99] that in the case 
0 < p < 1 there exists a threshold > 0 such that \x\ < 9* = 0 with 


t D 

iiy — -L/jy 


[A] 


uf2 

2 ^ 


—, 0 < p < 1 


[r]t^ 


where 


Aof l — u 1/ 

Dy = (2 — p)(2 — 2 p) 2 - 1 . 


(166) 


(167) 


This yields 

— _/4£sfl 1 1 — u 

[ 9 *]^ = 0 4 ^ |[ x ]/ < U = = 2“^(2 - p )(2 - 2 p )“~. ( 168 ) 

We note that one may show that 0<p<1^0<ti,<l. We observe that 

^ [x]. = [rj. + ^sgn([r]„)|[r]„r-\ [r]„/o. (i69) 

e=e* ^ 

The expression (169) applies to |[x]v| >iy liQ <v <1 and (169) applies to 
all [x]t, if 1 < p < 2. We further observe that the GGD-MAP estimator 9*{x) 
and 9*{x) exhibit step discontinuities at x = x = respectively, 
when 0 < p < 1: By (169) we see that 

iy= lim [x]^, = lim |[0*(x)]^,^sgn(4(x)]^)|4(x)]^,/“^| (170) 

^^K Z ) 

and while the lefthand side of (170) is finite, the righthand side increases to 
-|-oo as 0 * ^ 0^, if 0 < p < 1. Therefore, if 0 < p < 1, there must exist a 
number Sjy > 0 depending on p such that |x| > ty ^ \9*{x)\ > Sy > 0. The 
size Sy of the step discontinuity may be computed (numerically) for given 
0 < p < 1 by solving 


0 = 


dR{9) 


d9 


( 171 ) 
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We note that by (171), (168) we have ^ ^ 1 as u ^ 0 '^ and Si, 

itv — 1/2) ^ 0 as p ^ 1“. 


One can compile lookup tables of pairs of corresponding values {x,6*) to the 
equation (163) by discretizing 9* to some specific precision A9* and then 
use equations (169), (168) to compute corresponding pairs of values {x,6*). 
Since we ultimately want the estimated value 9* to some precision A6*, we 
have to ensure that the lookup table of pairs of values {x,9*) is computed 

on a sufficiently fine grid with stepsize A9* yielding a sufficient precision 

1 

A9* = A^-'^ {u, X,t)A6* when transforming by the formula (161). Letting 
5 > 0 denote the desired precision on the parameters 6*, then it suffices to 
demand 

5>A^{v,\,t)A9*. (172) 


Rissanen in [Ris98] computed the asymptotically MDL-optimal discretiza¬ 
tion 5* on the parameters which parameterize a n-variate distribution. In 
Proposition 11.1 in a previous section we presented a result on the posterior 
optimal discretization of parameters which deviates from the MDL-optimal 
5* in that it suggests a data-driven, possibly coarser discretization of the 
parameters. However, Proposition 11.1 shows that MDL-optimal discretiza¬ 
tion 5* is a lower bound on the posterior optimal discretization 5 (since 
Ee[A0^] = 1 and djn < 1), and so in the n-variate IID case of a gaussian 
likelihood with deviation cj we will use 


(5* = ^ 
'n 


(173) 


and by (172) we then find an upper bound for A9* to be 

[A]/ 


_ 1 _ i_ 

- T 2 1 T 2 

Ar < ^ • A“^(n,A,r) = 


1 

2-v 


n 


1 


n 


2r]{u) 


JA]i 


u \ — 


1 

2-u 


n 


1 


[T]d 


n 


27]{u) 


g^(A,r) 

2^ry(i/)2 


(174) 


For most datasets of interest we may bound D(A, r) from below by 1 (which 
means that we exclude data models where the noise in the data has greater 
power than the signal part of the data). We define 


Ax ^A9* 

d9* 

then by differentiating (169) we get 


Ax = 1 - 1 - 


Ar, [ 9 % / 0 


(175) 


( 176 ) 
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which corresponds to 

Ax = (^1 + A, r) I Ae*, [0*]. ^ 0 (177) 


Given a data value x, gridpoint pairs {xi,6*) and (xj+i, with Xi < x < 

Xj+i, we define the estimated parameter value 6*{x) by the linear interpo¬ 
lation 


§* + Xi) ■ 5+^ 


^ 2+1 


(178) 


Expression (176) can be used to compute a bound on the interpolation error 
for 9* for given gridsize A9* . The interpolation error A9* in the linear 
interpolation estimate 9*{x) may by equation (176) be bounded as follows 


A0* < min (|x — Xi\ 

, |x - Xi+l|) 

sup 

i+'^'T'Ori.r 






(179) 


2. The marginal normalization for GGD priors 

We need to calulate the Fisher matrix E{X) defined in (61) and 'I'(0, A) 
defined in (62) and the invariant estimator A* defined in (63) to be able to 
compute 'I'aa(^*) A*) which is part of the formula for the marginal distribu¬ 
tion given in Corollary 5.1. Plugging the definition (158) into the defining 
formulas we get 


^ d‘^log7:{e\X) 

EW = -Es -- 


- 5 ( 5-0 


Now, a straightforward calculation yields: 

Ee{\0n = (^^ 0 ) 

and we may then write 


EiX) = 


^ + 5(5-ij>,MA 


ud/A 


( 181 ) 
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We may then calculate A) as 
7r(0|A) 


'I/(0,A) = - log 


L|£l(A)|V2j 


and A* then becomes 


A* =^arg m4>o^'(0, A) = 


= ^ log (y) - log[A]^ - log [7r(0|A)], 


{d + ‘If!'' 




2lv ■ 


(182) 


(183) 


For notational convenience, we define 
d 

def 


KW = 


(184) 


i=l 


We may now proceed to calculate 
'i'AA(e,A) = ^ + 


d+2 


A-^ 1 + 


12 (V 


2 '■ V ' d + 2 V2 

and we may now by means of (183) evaluate 


(185) 


(186) 


We may now calculate the quantization induced by the mapping x(A) on 
the parameter A as described in Proposition 11.1. We have 


AA = x'(A)|'&(r, A*)|-i/2 . ^ ^*)|-i/2 . 


\l y{d + 2 ) 


• A*•cx 


(187) 


We may now calculate the map x • A ^ A which defines the invariant 
parameterization A. Define the log likelihood L by 


L(0)"=l'-log 


TT ( 0 


= - log 
2 ^ 


X(A) +dlog 


v'q{v) 




2=1 


(188) 
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Now, the Fisher matrix E{6) in the invariant parameterization A is defined 
by 


To make the parameterization A invariant, we have to demand 


(189) 


|^(A)|=A“2 (190) 

where A > 0 is some constant number. This yields the equation 


-E, 


{l( 0)} = r(A) = CO + ciA + ^A'^A^ 


(191) 


for some real constants cq, ci- Plugging in the expression L{0) from (188) 
into (191) yields 

^(A) = -^log x(A) ^-dlog^^^^^ +x'''^^(A)r?(p)''^£’0{|6»ir} 


= — log 
2 ^ 


«'J„ l,2r(i)y r{i) „(0'' 


Now, solving for x(A) yields 
-2 


X(A) = 


vr}{v) 

,2fTy 


exp (2/i/) exp 


A + ciA j 2co - cf A2 


V 


<iA2 


Specifying the initial condition x(0) = Aq, we get 


X(A) = Aoexp 


/ 


V 


A T Cl A^ 


dA2 


+ 


cfA2 


(192) 


Using (183) we now evaluate the integral 

^ ^ v{d + 2) \^ / ur]{u) 

\2T{l/v] 


f nm*) 
lees |^aa(0,A*)|V2 


de 


X ^ exp ^-i?(ia)-(A*)"/2 de 

_ /p((i +2)\“^^^ / iyri{iy) 


d+2 


V2r(iA 


d + 2 \ I (d + 2)' 


v^r]{u) 
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d+2 



Now, using 


(193) 


0 


e G 


o<{r.r <Y,\e,r 

i=l 


< 


El".'!" 

i=l 



(194) 


and performing a suitable change of coordinates (see [GROO], page 610) the 
integral (193) evaluates to 


TT{e\x* 


/6»ee 


|^aa(0,A*)|V2 


de 


v{d + 2) 


4 

l^d 


r(l) 


- 1/2 


-2 


vr]{v) 

2r(i/u) 


exp 


d + 2 \ ( (d + 2) 1 




-£-l 


dx — R 


-2 


X ^ dx 


■"'r (i) 

= (2ir)-C2(d + 2)i-idiri(v)-'v-ir-^ 1 1 - (tt) 1 

\ \ Ry) I 


d+2 

2 


(195) 


This result may be plugged into Proposition 6.1 to yield the precise code¬ 
length contribution from the term logC'..y^. We observe that it will only 
contribute constant terms plus a (2/u)logd term. 


3. The model selection algorithm for GGD distributed 

parameters 

By Proposition 8.1 we see that we will have to investigate the behaviour 
of — log7r(0*|A*). Choosing the ML estimator (183) for A*, we get 


C(®||(i)|5i_i) '^+-{n-d + 2)^ 

II® lb 

_^log 

^l®llWI \\^xx{0*,X*)\V 


= — (n — 


-|- (d -|- 2) 


^1*11 WI 


1 < i < d 


(196) 
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It is easy to see that with a possible exception for the derivative term, all 
terms in (196) are decreasing functions of |£C||(i)|. As for the derivative term, 
we see from (169) that this term is positive and bounded by 1 for 0 < p < 2 
for |®||(i)| sufficiently large and so we may conclude that C'(a;||(i)|5i_i) is a 
decreasing function of |a;||(i)|. Therefore, the d nonzero elements of a;|| G M" 
are the d largest \x(i)\ in the dataset x G M”. 


4. The approximation errors for the GGD model 

We need to control the approximation error terms k and ^ as defined in 
the proof of Theorem 4.1. An easily computable upper bound for the error 
term k is given in Theorem 4.1. The upper bound for the error term ^ as 
shown in Theorem 4.1 may be considerably simplified in the case of a GGD 
prior distribution on the noiseless data. We have the following result: 


Proposition 4.1. One may verify that the GGD distributions satisfy the 
eonditions in Theorem 4-1. Using the notation from Theorem 4-1 we may 
state the following upper hound on the number f for GGD prior distributions 

d 


^ + 1 < exp exp ^ d 

nxffi) 


log2 ) - ^ri||a;||||^ -^log 


d 


+E 

i=l L 
where 

T* exp 


r]{u) 






( 2 ,) 


i=l 


i=l 


r-rxw 


TfXu{l) 



(197) 


log^ N{X,n,-fd) 


1 / 2 - 


< Ti < exp 


L,, i r2x I = 


log^ N{X, V, 7d) 
^(A,P,7d) 



1 / 2 - 


v-l 




(3«(Lt))- 


1 + 


1 

T2X 


m(x,T)y 


if 0 < u < 1 
if 1 < u < 2. 


Proof. First, by (56) and the fact that 'K\{x) is a monotone decreasing 
function of |x|, we observe that 


?+i<n 

i=l 


1 + 


2Pg ( -rf |a3||(i)| ) 7rA=i(0)/7rA=i 


Tl ajy (2) 


l + erf(r]^'|®l|(i)|) - { Tfx\\{i) 


( 198 ) 
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Plugging (158) into (198) we get 

2Pg (-t^ |»^ii(6I ) I nii'Y 


f+i<n 


2=1 


1 + 


ria: 2 (j) 


1 + erf(ri2|a;||(i)|) - 


(199) 


Using the bound Pci—t) < t ) V t / 0, we may write 


2=1 


f+i<n 

i- 

( 


2 exp -iri£c^(i) - log 


1 + 


TtX 


II « 


+ 


r]{vY^ 


2 


r 2 (A,ri) 


1 + erf(ri2|®||(i)|) - { U"*||(0 


< exp 


E 

2=1 


V 


2 exp —(i) — log 


U "®||(0 


+ 


, NO ^ 

fTWl) 


l + erf ( ri "| a ;||( z )|) - ( u "®||(0 


del 


v-1 


where 

Lu ) =' 
and 

Ti G /t = exp 

/21ogA^(A,p,7d) V 


r2|x|j (fll(A,r)) 2 if0 <zo<l 


( 200 ) 


( 201 ) 


if 1 < JO < 2 


/ 21ogjV(A,jo,7rf )^^ 2 


T exp 


- 




( 202 ) 


We see that the righthandside of (200) makes no sense when lo ^ O"*" because 
lim^^o+ rj{vY = +oo and also the validity of expression ( 200 ) depends on 


2??(p)^p ^^ I r]2a;||(i) ) < 1, V i G 7 ^. 

(27r)2 ' ' 


(203) 


This lack of generality is due to our choice of technique for estimating ^ in 
the proof of Theorem 4.1 where we implicitely assumed 

7rA=i ^011(A,r)^ r2x^ /(sup7rA=i(t)) > 2ri{uY u{2ti)~^L y{T^x\\{i)), 
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V i G 7d, Vr G Ir, 


( 204 ) 


and is therefore not due to an intrinsic property of the model. Now, because 
of (203) we have 


- 1 <-2r/(p)'^p(27r) 2 Li,(rj^a;||) + erf I |a;||(i)| J <1 


(205) 


and by the inequality log(l + x) > x — \x‘^, V |x| < 1 we then have 
log ^1 - 2ri{vyLy{T^ x\\)+ evi |x|| (i)|^ ^ 

> 2r]{uYu{2TT)~^Ly ^r]2®||(i)^ + erf |a;||(i)|^ 

- \ (^-2r]{iyyu{2Tr)-^L^ ^ri2a;||(i)^ + erf |x||(i)|^ ^ . (206) 


Using (206) on the expression (200) enables us to write 

d I , id 


1 


C + 1 < exp exp { -^ri||x ||||2 -'^log T^x^i) 


2=1 


2 = 1 L 


r]{u) 


Tix'jii) 

7^(A,ti) 


+dlog2 + 2r/(iy)'^z/(27r) 2 (r^ xyi)\ - J]]erf |x|| (i)]") 

i=l A / V / 




2=1 L 


2r]{vyu{2^) 2^Ly(T^x\\{i) \ +^erf (ryxyi 


2=1 


2 = 1 


(207) 


< exp exp <( d ( i + log2 j - ^ri||x||||^ “ 

A i=i 


+E 


2=1 L 


77(1/) 


2 

TlXy (i) 

W^) 


d 


■^X\\{i) 

+ 2r]{vyv{2Ti)-^2'^Ly fr^^xyi) 


2=1 


(208) 


and we may easily evaluate an upper bound on the righthandside of (208) 
by evaluation with 


Ti = T* exp 


/ log^iV(A,p,7d) y'^^ 

V ^(A, U Id) ) 


and 

Ti = T* exp 


/ log^N(A,p,7rf) y'^^ 

V ^(A, U Id) ) 
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with iV(A,u, 7 rf) as given in Theorem 4.1. 


□ 


5. Numerical methods and experiments 

In this section we show the performance of our INMDL-algorithm when 
applied to the problem of estimating various kinds of 1-dimensional data 
(signals) and 2-dimensional data (images) embedded in IID gaussian noise 
(the ’’denoising”-problem) and we will compare the performance of INMDL- 
principle developed in the previous sections to various other kinds of denois¬ 
ing algorithms. Detailed numerical results are shown in the appendix while 
a graphical overview of estimator performance is shown in Figures 4-7. We 
will here focus on the NML-principle of Rissanen as presented in [RisOO], 
the RiskShrink-thresholding algorithms as presented in [DJ94], [BG95a], 
[BG95b], (that is the universial hard thresholding scheme with threshold 
cj-v/2 log N where ci^ is the noise variance),the SureShrink-thresholding algo¬ 
rithm given in [DJ95] and the MAP-estimator deduced from an IID GGD 
model applied to the full data set [ML99], that is 

X = 6 + T), 0,r],x G (209) 


and 


'i- <i < N, r]i^ Af{0, T ^/^), 1 <i < N. (210) 

where is a GGD distribution with mean zero, second moment and 

shape parameter u. Note the difference from the model defined in (25)-(26) 
from which we deduced our INMDL principle. This MAP estimator equals 
the estimator called Tmap defined in [HYOO], except that we use the exact 
MAP estimator (up to interpolation errors in the numerical approximation of 
this estimator, see expressions (174)-(179)) for general values on the shape 
parameter u, whereas in [HYOO] they use the MAP estimator for u = 1 
which is the soft thresholding operator (2). We adopt similar notation for 
this estimator; We write where v signifies the shape parameter in 

the GGD distribution. To make the conditions under which our reported 
numerical experiments were conducted, as clear as possible, we list some 
remarks: 

(1) The image data used in our experiments were mostly collected from 
the USG-SIPI Image database at 

http://sipi.use.edu/services/database/Database.html, 
see Figure 1 and Figure 2. The one dimensional signals used here 
are the standard examples used and defined in [DJ95], see Figure 
3. 
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(2) All images used in experiments are bitdepth 8 gray level images of 
size N = n X n = 512 x 512 unless otherwise is specified. The one 
dimensional test signals are of length N = 1024 unless otherwise is 
specified. 

(3) In the tables shown in the appendix, results obtained from datasets 
X G with computer generated noise are shown. The definition 
of signal to noise ratio (SNR) of the dataset x = 6 + r/ used for 
signal 0 G and noise r] G when generating datasets x with 
different SNR values is; 

SNR=101ogiofTlt!|2V (211) 

where / signifies the gaussian likelihood distribution. 

(4) The SNR measure used when reporting signal to noise ratios in the 
estimated signals 0* in the tables in appendix is; 

®? = l»lo8io(pl0j|)- (212) 

(5) The error measure used in tables below will be a scaled version of 
the root mean square error (RMSE) defined by 


RMSE 



(213) 


where 6 is the estimate of the signal 0 and is the variance of 
the noise ij. 

( 6 ) For the RiskShrink, SureShrink and Tmap algorithms the noise 
variance was estimated from the highpass band using the me¬ 
dian estimator, see [DJ95]. Also for the Tmap algorithm we esti¬ 
mated the signal variance by the moment estimator A* defined 
by; 



(7) The INMDL principle was implemented by an iterative scheme 
in our numerical experiments as follows; The NML principle of 
[RisOO] is used to provide an initial estimate of the best model 
7 ^* from which we compute initial estimates t* , X* of variance pa¬ 
rameters r and A and then an initial estimate 0 * of the wavelet 
coefficients 6 of the data is computed. These parameter estimates 
are then fed into the model selection principle as defined in Propo¬ 
sition 8.1 and a new estimate of the best model 7 ^* may then be 
computed and the iteration process continues with new updated 
estimates r*. A*, 6* and so on. The GGD shape parameter p is 
also estimated in each iteration step using the estimate 6* and the 
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estimator u* provided in [DV02]. This whole model selection it¬ 
eration procedure continues until changes in the estimates of the 
optimal model size d* between two iterations falls within 5%. We 
also note that the number Cx^ in the model class prior distribution 
D{p,q) in (93), (94) was set to Cxp = 1.0 in all our experiments 
reported below. 

For image experiments, we show results from the GGD MAP es¬ 
timator T^\p for values v = 1.0 and u = 0.7 on the GGD shape 
parameter. The reason for our choice of these values, are that ex¬ 
tensive empirical investigation [ML99] show that a GGD model 
with V G (0.5,1.0) provides a reasonable prior model for many if 
not ’’most” natural images. Also, the choice of u = 1 yields the 
Laplace distribution which is very often used as a model distribu¬ 
tion in the image denoising community because one then can obtain 
closed form analytical solutions to estimator and risk equations in 
the case of gaussian noise. 

For the experiments with 1-dimensional signals, we show results 
from the GGD MAP estimator for values v = 0.5 and 

V = 1.0. Unlike the case of image data, we have in this case no 
prior knowledge which supports a choice of a GGD model for the 
data. However, the wavelet basis is known to yield sparse represen¬ 
tations of piecewise smooth signals [DJ94], so a GGD distribution 
with u < 1 could be worth a try. The choice v = 2 yields a gaussian 
model distribution, which maximizes the entropy for a given vari¬ 
ance, but this choice turned out to yield a very poorly performing 
estimator T^(^p, so we omit it. 

The ordinary full depth periodic wavelet basis with a symmlet of 
filter length 16 (Symmlet 16) was used as the wavelet basis in all 
the image experiments. 

All numerical experiments reported in this thesis were carried out 
on a 2.0 GHz Pentium4-Mobile PC with 768 MB RAM running 
FreeBSD-4.9 as operating system. The experiments were all imple¬ 
mented in the C programming language except a few cases were 
we have been using the NAG Fortran Library Mark 16 for some 
standard mathematical functions and random number generators. 
The C compiler used was Intel C-|—|- compiler version 7.1 (build 
20030922Z). 

We note that even though our implemented version of the INMDL 
procedure is quite computing intensive, it runs in 0{N \ogN) time, 
and typically on our 2.0 GHz Pentium4 PC with image data with 
N = nxn = 512x512, the run time is about 50-90 seconds when the 
source code is compiled with full optimization. The computational 
bottleneck by far is the computation of the GGD-MAP estimate 6* 
by linear interpolations. However, we have not gone to any effort 
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in optimizing our implementation for speed. Considerable speed 
improvements may be possible. 

(13) The same noise realization was of course used when comparing 
the different algorithms shown in tables below. We only report 
results obtained from a single realization of the noise because we 
found that the both SNR and RMSE results for all the denoising 
algorithms in the tables deviated by less than 1% over 3 different 
noise realizations when used on test image ” Barbara”. 

(14) In the experiments on images below, we checked the validity of our 

asymptotic marginal formula in Theorem 4.1, Corollary 5.1 and the 
marginal renormalization constant in Proposition 6.1, by checking 
(the upper bounds of) the numbers k < T^, ^ < T^, oj < X. 
These were found to vary as; l.OE—3 < < 9.0E—2, 1.0E2 < 

< 4.5E9, l.OE-3 < log2(l + T^)/d < A.OE - 2, l.OE-2 < 

< 9.0E-2, l.OE-3 < X < l.OE-2, l.OE-3 < C < 3.0E-2. 

Furthermore we observed that 2.4 < infi<j<(i < 4.0 always 

for the test images used. The posterior biases shown in Corollary 
10.1 were found to be of insignificant size; j 0.01% of the estimator 
values 6*, 1 < i < d and t*, for all of the test images. We emphasize 
that although the numerical values of ^ were found to be large, the 
contribution from the term (1 + ^) to the codelength is given by; 

— log 2 (l + ^)/d per model sample in the mean, and this is found 
to be of the same order per model sample as the uncertainty ±0.5^ 
in the codelength contribution from the marginal normalization 

— log 2 (see Proposition 6.1) which we have explicitely neglected. 

(15) For the experiments on 1-dimensional data below, we checked the 
validity of our asymptotic marginal formula in Theorem 4.1, Corol¬ 
lary 5.1 and the marginal renormalization constant in Proposi¬ 
tion 6.1, by checking (the upper bounds of) the numbers k < Tk, 

^ < T^, C, uj < X. These were found to vary as; l.OF—3 < 

< 5.0F-2, l.OFO < < 4.0F0, 3.0F-3 < log2(l + r^)/d < 

2.0F-2, l.OF-2 <T^< 5.0E-1, l.OE-3 <X < 2.0E-2, l.OE-3 < 
( < 5.0E—2. Furthermore we observed that 1.96 < infi<j<rf \V^d*\ < 
3.7 always for the test signals used. The posterior biases; Eq t-{6* — 

9i) and;£'e^T-(r* — r) shown in Corollary 10.1 were found to be of 
insignificant size; j 0.001% of the estimator values 6*, 1 < i < d 
and T* , for all of the test signals. 
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5.1. Discussion of experimental resnlts. When applied to image 
data, the SureShrink method [DJ95] clearly outperforms all of the tested 
estimators over the whole range of tested SNR values as seen from Figure 
4. The SureShrink method is a hybrid method between a soft universial 
thresholding scheme as given in [DJ94] and an adaptive thresholding scheme 
given by adapting the thresholds to minimize a risk estimate using Steins 
unbiased risk estimate (SURE) given in [SteSl]. The hybrid scheme of 
SureShrink decides in each wavelet subband whether the signal is sparsely 
represented in the subband. In sparse situations the universial thresholding 
scheme is used, otherwise the SURE method is used to provide risk estimates 
in each wavelet subband. Thus, different adaptive thresholds are used in 
each subband by the SureShrink, whereas the other methods use a global 
(identical in all subbands), although data adaptive, thresholding scheme. 

Comparing the NML and INMDL-principle we note that the NML- 
principle does not have a robust performance for the datasets tested here, it 
fails badly compared to all the other methods as the SNR falls below 10 on 
the dB scale as may be seen from Eigure 4 and Figure 5. The performance of 
the INMDL-principle in the region of low SNR is the second worst method 
measured in RMSE for SNR < 5 dB , but it does not fail as bad as the 
NML-principle. Coupling these observations to the information in Figure 7 
and Figure 6, we conclude that the main explanation for the observed weak 
performance of NML and INMDL in the low SNR region, is that the sizes of 
the optimal models as predicted by these model selection principles are too 
large, this behaviour is especially clear for the NML-principle. The Figure 
4 shows that the INMDL-based estimator has second best performance of 
the tested estimators for image datasets in the SNR range 10 dB < SNR 
<15 dB. For image data in the high SNR region we see that performances 
of both NML and INMDL weakens as the SNR increases when compared to 
the GGD-MAP estimators and the SureShrink principle. Figure 6 explains 
why: The predicted optimal model sizes are too small in this SNR region 
for image data. However, for the 1-dimensional test data the situation is 
reversed: As the SNR increases the performance of NML and INMDL based 
estimators improves and outperform the SureShrink and the GGD-MAP es¬ 
timators. The Figure 6 explains why: The SureShrink and the GGD-MAP 
estimators keeps too many wavelet coefficients for this type of data whereas 
the model sizes as predicted by the NML and the INMDL principles yields a 
smaller number of nonzero wavelet coefficient estimates which closely match 
the RiskShrink estimator both in performance and sparseness of the wavelet 
coefficient estimates. We note that the RiskShrink estimator 0) is known 
to be universally near-optimal in the sense that to within a logarithmic fac¬ 
tor it achives the ideal risk obtained with an oracle estimator [DJ94], that 
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is 


E^\\e^^^'){x)-eg<{2\ogN + i) 



N 


+ min(0. 


2 ^ 2 ', 

i ^ ) 


i=l 


,y ogr^ 

(215) 


and no estimator can come closer to the ideal risk than 0^^^^ for all 0 G 
without relying on an oracle. 

Figure 4 and Figure 5 indicates that the tested estimators perform quite 
differently relative to each other for a given SNR level, depending on whether 
the data belongs to the 2-dimensional test datasets or 1-dimensional test 
data in our experiments. The explanation may depend on several factors: 
The sample size N which in the experiments here defer by two orders of mag¬ 
nitude between the two-dimensional and one-dimensional datasets. How¬ 
ever, we verified (see remarks above) that the parameters controlling the 
error on our marginal approximation formula are well inside the required 
intervals for both sample sizes N = 2^® and N = 2^^ in all our experiments. 
Therefore we do not believe the observed differences in performance are pri¬ 
marily due to differences in sample size here. The ability of the wavelet 
basis to sparsely represent the data in the wavelet domain ("few” large and 
many ’’small” wavelet expansion coefficients) is important. In this respect 
we note that wavelet bases are known to optimally (in a certain strictly 
defined sense) [Mal98a] represent data inside a ’’ball” of bounded total 
variation functions and a large class of ” natural” images belong to this class 
of functions [DJ95], [KM03]. 
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Figure 1. Test images. 
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Figure 2. Test images 
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Scaled RMSE meaned over test images plotted against SNR for the tested estimators 
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SNR of the data to be denoised (dB) 


Figure 4. Mean estimator performance on the test images. 





Mean RMSE of denoised data as a fraction of the noise deviance (%) 
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Scaled RMSE meaned over test signals plotted against SNR for the tested estimators 
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SNR of the data to be denoised (dB) 

Figure 5. Mean estimator performance on the 1- 
dimensional test signals. 





Mean fraction of nonzero wavelet coefficients of denoised data (%) 
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Sparsity of the wavelet representation of denoised data meaned over test images 



SNR of the data to be denoised (dB) 


Figure 6. The sparsity of the wavelet representation of the 
denoised test images measured as the fraction of nonzero 
wavelet coefficient estimates. 





Mean fraction of nonzero wavelet coefficients of denoised data (%) 
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Sparsity of the wavelet representation of the denoised data meaned over test signals 



SNR of the data to be denoised (dB) 

Figure 7. The sparsity of the wavelet representation of the 
denoised 1-dimensional test signals measured as the fraction 
of nonzero wavelet coefficient estimates. 
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INMDL, SNR = LOdB 



INMDL, SNR = 20.0 dB 



NML, SNR = LOdB 



NML, SNR = 20.0 dB 



Figure 8. The code length as a function of model size for 
the test image boat. 






CHAPTER 4 


The INMDL-principle applied to an inverse 

problem 


1. Definition of problem and data generating model 

We will investigate the performance of the INMDL-principle when ap¬ 
plied to the problem of estimating signals or images 6 which have gone 
through a degradation process modelled as 

y = u * 9 + r] (216) 

where u is a known lowpass filter and y is IID gaussian noise and * de¬ 
notes the convolution operator. We will rely on and use as reference work 
presented in [KMR03] and [KM03], in particular we will use the mirror 
wavelet basis constructed in the cited papers, see appendix. The motivation 
behind our investigation into applying the INMDL-principle to the deconvo¬ 
lution problem (216) is our experience from numerical simulations concern¬ 
ing the denoising problem in the previous section that the INMDL-principle 
as developed in previous sections seem to be very robust against high noise, 
and so one could expect that the INMDL-principle would eliminate the 
need for ’’hard” regularization techniques like the cutoff-frequencies kc in 
the Fourier domain introduced in [KM03] or the modified threshold estima¬ 
tors in [KMR03] demanding the a priori knowledge of the numbers SB[m] 

which are SB[m] supj-g 0 |(/, bm)\ where bm are elements in an orthogonal 
basis B and / belongs to a predefined set of signals (datasets) 0. Also the 

INMDL principle does not need to know the noise level E{r]‘^) before¬ 

hand. In addition, the use of a prior distribution tt in the INMDL-principle 
allow for a more sophisticated modeling of the wavelet coefficients than in 
the papers [KMR03] and [KM03], this may enable a better reconstruction 


of the degraded data. Formally deconvolving the data x 
yields 

in expression (216) 

def _ 1 ^ , _ 1 

X = u * y = 9 + u * rj 

(217) 

where the inverse u~^ is defined by 

(218) 

where 


u{(w) iF{u){u) 

(219) 
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and T denotes the Fourier transform. When the Fourier transform of 
the inverse filter \Iu{lS) is not bounded in the high frequencies, the noise 

Z u~^ *7] resulting from the deconvolution of data y in (217) is amplified 
by a factor that tends to infinity. Therefore, in general the deconvolution 
problem (216) is an ill-posed inverse problem, and solutions to this type of 
problem must include some kind of regularization procedure for removing 
the worst part of the deconvolved noise Z. The INMDL-principle natu¬ 
rally provides a regularization through a model selection process and we 
will now investigate how the INMDL principle may be adapted to decon¬ 
volution problems. Assuming the convolution is circular, we may write the 
discretization of (216) on the form 

y = Ue + r] ( 220 ) 


where U £ is the matrix representation of the smoothing operation 

by convolution by the lowpass filter u. We define 


X U~^y = e + U~^ri ( 221 ) 

Z (222) 

K E[ZZ'^] = (223) 


The mirror wavelet basis W £ [KM03], see appendix, approximately 

diagonalizes the covariance K of the deconvolved noise Z, that is for Wf^ £ 
W, 1 < k < n we have 


K W^KW ~ diag (w'^KW^ ~ diag {{Kwk, Wk)) 
= diag ((Kwk,Wk 


l<k<n 


( 


l<k<n 
2 ' 


= diag 


-E 


\Wk[l 




;1I2 


= Kn 


(224) 


l<fc<n 


where Wk = WpWk is the discrete Fourier transform of Wk and K = 
T~^'WpU~^U~'^WF and Wp £ is the discrete Fourier basis on C”. 


2. The model selection algorithm 

We would like to be able to use our previous results to compute an 
approximation to the marginal density of the deconvolved data x. However, 
in the current case of a non-constant diagonal covariance matrix Kp defined 
above, it is non-trivial to find suitable parameter transformations 9i 9i, 
T f which makes the transformed Fisher information \F{6,f)\ a constant. 
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This can be seen by retracing the steps in the computation of \F{6, f)| shown 
in the appendix. We present a workaround on this problem below. Define 




N 


i=l 


wM 


\u\i\ 


1 < k < n 




(225) 


and the change of variables 


x = W^x, e = w^G, 

X = diag(t“^)i<i<rf x, 

6 = {TKoT^/^e = diag(f-^)i<i<rf e (226) 

We may now use our previous results Theorem 4.1, Corollary 5.1 to compute 
an approximation to the marginal distribution m^^{x) of the deconvolved 
data X in (221). It is easy to verify by inspection of the proof of Theorem 
4.1 that the approximation result for the marginal density provided in The¬ 
orem 4.1 applies to the transformed data x and parameters 6 with minor 
adjustments. However, there are some important remarks to be made here: 

(1) As before, the parameters 6i, 1 < i < d are modelled as identi¬ 
cally and independently GGD distributed parameters with density 

We note that the empirical research on the modeling of 
image wavelet coefficients in the litterature [ML99] concerns pure 
wavelet bases, not mirror wavelet bases as in the current context, 
but we will here use the GGD model also for the case of mirror 
wavelet bases. _ 

(2) The transformed parameters 6i, 1 < i < d are independently dis¬ 
tributed, but not identically distributed. By the coordinate trans¬ 
formations defined in (226) we see that 

Oi ~ VIA ^ ~ l<i< d, (227) 

where 

Aj Xtf (228) 


with ti as defined in (225). Furthermore, we note that 

= aV20.. (229) 

(3) The proper definition on the SNR D , t\ in the current 


case of transformed data x and parameters 0 is 




del 


yd y . ~ 21 

Xi _ }^i=l C A 


n- 


n- 
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- - - fl(A,T), where Q(A,t) = - r. 

a ^ 




nr 


( 230 ) 


By applying the same mappings ipir) and (j){9i,f) defined in (39) to the 
current choice of coordinates x, 6 and going through the proof of Theorem 

4.1 provided in the appendix, replacing Xi by Xi and 0* by 6i, we see that 
our previous results generalize straightforwardly to the current case of non¬ 
white gaussian noise through the whitening transformation defined in (226). 
Because the MAP-estimator 9* defined in (159) is nonlinear, some care has 
to be taken to estimate the parameter A which is needed to estimate the 

Aj and the 9i, 1 < i < d: Given an initial model index vector 7 ^°^ we may 
define and thus initial estimates Aq, Tq and 0g- We define A* = A*tf 

and this may be used to compute the MAP estimates 9*i. Applying the 
model selection principle given in Proposition 8.1 to the whitened data x 

and their corresponding parameter estimates 9i will provide us with an up¬ 
dated model index vector 7 ^. Then the same iterative procedure applied 
previously in the case of white gaussian noise may be used to compute suc¬ 
cessive estimates A*, r*, 9* and through these we compute new estimates 

A* and 9*. 

The prior distribution ttj (9) in the current case of independently dis¬ 
tributed 9i, 1 < i < d where 7 r^( 9 i) is given in (158), becomes 




2=1 


n 

2=1 

d 


1 / 2 ' 


vrj{v)A, 

2r(i/R) 


exp 




2 = 1 








V2=l 


2 = 1 


AV20, 


(231) 


We note that the factor 0^=1 vanishes in the formula (49) for the marginal 
density m^^{x) because this factor is also included in the normalization 
factor C^^{x). To evaluate the model selection criterion CO defined in (104) 

in the current case of non-IID parameters 9i, we will use the MAP-estimator 
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A* in (183); 


A* 


{d + 2f/^ 







(232) 


and we then get 


C (®||(i)|S'i_i) -{n- d + 2) 




|;?||2 _ || s ;„||2 
I•^ll2 Il*^||ll2 


d 


d\xu{i)\ 


log 




1^- 


AA 


e*,x^ 


= —{n — d + 2)- 




UE n - UEll 


+ (d + 2) 




11^-1 


9|0*i(a;||(i))| 


^1*11 I 


1 < i < d. 




(233) 


The current criterion C'(-) in (233) is not as easy to minimize over the data 
X as in the previous case of IID parameters in (196). The explanation for 
this is as follows: Suppose 0 < i/ < 1, then even if |5||(i)| is large, it may 

still happen that d*i(x(i)) = 0 yielding C ("Sii(z)|S'i_i') = oo, because the 


MAP estimator 0*i is a threshold estimator with a threshold which grows 

with Aj = Xtf as may be seen from (166). We have observed that this 
effect is a real problem in our numerical experiments. As shown in the 
proof of Theorem 4.1, our marginal approximation formula is not valid for 
small parameter estimates 9 *, and so we cannot allow the selection of model 

indices i with 6*i = 0. To overcome this problem, we will adopt a possibly 
suboptimal model selection algorithm which we believe/hope is not far from 
the model selection procedure which minimizes C'(-) in (233); We will simply 

select the indices i with the largest estimates \9*i\. Now, we will skip the 
details on going through the proof of the Theorem 4.1 and making the 

necessary adaptations to the current case of non-identically distributed 9i, 
1 < i < d. However, the changes are straightforward and we list below the 
ones concerning the sufficient conditions on the numbers C) n, X under 
which the Theorem 4.1 on the marginal approximation and the Proposition 
6.1 concerning the marginal normalization constant, still both apply to the 

current model for the data x and parameters 6 defined above. We need 
upper bounds on the numbers C, k, X in order to check the validity of 
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the marginal approximation and the resulting codelength principle in our 
numerical work. Renaming X^X under the current 

model we have 

C=^ sup - 1\ ^< 1 (234) 

i<i<d I \a / ) 

where 14(A,r) is a signal to noise ratio (SNR) defined as 




nr 


-1 


.-h{v) 




n — d + 2 


1-C 


(235) 

(236) 


where 


h{u) 


V if 0 < p < 1 

Z//2 if 1 < < 2. 


and 
\ = 


Etur iti<.'<2. 


Then under the claims (234) and (236) we have the following bounds on the 
normalization constant C^^{x) and the upper bounds on the error terms k 
and ^ for the marginal expression m^^{z) in Theorem 4.1: 


isi<hi+c)^d4ixil>^ 

3 (|0(ANr*))^ 




E 

i=i 


u-l 


{T*)-2e* sgn(0*,) 1 + 


exp \T*{9*j)‘^ 


+ 


E 






(271)- 


+1 

where 


1 

* 

* 

i0*)(£||(j)-i0^) 

exp 

(0R)2 + (0*^)2 

) 


Ar(A*,i/,7d)~ 


n—d~\-2 
2 ’ 

n—d-\-2 1 | ^ 

2 4 


if 0 < 1/ < 1 
if 1 < p < 2. 


( 237 ) 
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1 <^ + l < 

d 


i=l 


n 1 + 


2Pg sup7rA=i(t)/vrA=i 

teR 


inf 


1 + erf I T 2 \x\\ {i 


tel 0,no.i( 




7rA=i uo,i r.^xu{i) 


2C^u~ i:r , ,, tG(«o,i(r2 a:||W),oo 
271 2 


n -1 


7rA=l Uo,i T.^Xu{i) 


if 


7rA=l Uo,i T^Xii{{) 


(2vr) 


(238) 


> ( T2^®||(i) ) , y i&7d 


where ri,r 2 G Ir, wo,i(s) = ^ |s| and 


r / i A del , 
Liy,i I T^X I = < 


T2X 


u-1 


((?SSi(A,T))- 




1 + 


1 

T^X 


(tl¥KKr)) 


if 0 < 1/ < 1 

if 1 < u < 2. 
(239) 


We also note that the INMDL-optimal quantization principle given in Propo¬ 
sition 11.1 only applies to the transformed parameters 9 and not 0 be¬ 
cause the Laplace approximation used to estimate the marginal distribution 
m^^{x) in Theorem 4.1 was deduced under the assumption of IID gaussian 
noise. 


3. Numerical methods and experiments 

We applied the INMDL-principle to the deconvolution problem defined 
above for some test images and compared the results to the thresholding 
algorithm proposed in [KM03]. We define the total variation measure || • \\tv 
Af-l 

ll^lltf ^ [(0[m, n-|-1] — 0[m, n])^-|-(0[m-t-1, n] — 0[m, n])^] 

m,n=0 

(240) 

for data 9 G It may be used to compare the smoothness of the 

original, degraded and estimated datasets. 
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(1) The noise variance is set to = 1 in all the experiments 
on graylevel images below, with the graylevel values ranging in the 
integer range [0,255]. 

(2) The definition of the SNR is the same is in the previous experimen¬ 
tal section on estimating in white gaussian noise. 

(3) The method of thresholding in a mirror wavelet basis [KM03], 
[KMR03] is denoted MWT below. We note that the MWT imple¬ 
mented here does not include a shift-invariant estimation as used 
in the works cited above. This is due to lack of time to implement 
the required numerical wavelet-routines. 

(4) We note that a Fourier cutoff frequency kc with kc = y— 8 was used 
to cut the the deconvolved data in the Fourier domain because it 
was needed in the MWT algorithm to stabilize the algorithm. We 
note that the INMDL algorithm was found to yield the same results 
with no Fourier cutoff. 

(5) The wavelet used was the Symmlet of filter length 20 in all exper¬ 

iments in this section. We note that in all the image experiments 
below, the numbers: k, C, a;, X, infi<j<(i I defined above 

on which bounds are needed to ensure the validity of our asymptotic 
marginal formula in Theorem 4.1, Corollary 5.1 and the marginal 
renormalization constant in Proposition 6.1, were found to range 
in intervals approximately as stated in the previous experiments 
section. 

We wanted to investigate the performance of the INMDL-algorithm us¬ 
ing a harder blurring operator, for example operators given by box car con¬ 
volution filters. Since the frequency response of such a filter is a sinc-function 
with multiple zeros in the frequency domain, the MWT-algorithm is not ap¬ 
plicable in this case. We have implemented a INMDL-based method which 
uses an adapted wavelet packet basis B where the basis is adapted to both 
the degraded input data y in (216) and the deconvolution filter u~^ in (218). 
The only difference to the INMDL-based deconvolution algorithm defined 
above for the hyperbolic filters, is that the mirror wavelet basis is exchanged 
for a specially chosen wavelet packet basis B*. We briefly outline below the 
main ingredients in the process of selecting a suitable basis B* and refer to 
[Wic94] and [Mal98b] for details on wavelet packet bases. 

(1) A wavelet is chosen (Symmlet 20 in our case) and the degraded data 
y is expanded into some (not full) constrained anisotropic wavelet 
packet analysis on see [Wic94]. An additive cost measure 

[CW92] is specified, we used here the entropy-measure Se- 

n 

Se{y) ^ yf log yf. (241) 

i=l 

Then {Bfy) is evaluated for all the allowed discrete wavelet 
packet bases S* on for the given wavelet (S20) using the fast 
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’’best basis algorithm” of [CW92]. We note that the total number 
of different wavelet packet bases exceeds 2^/^, [CW92] where N = 
n X n for image data. However, the ’’best basis algorithm” ensures 
that the unique basis minimizing the additive cost measure is found 
in 0{N log 2 N) operations. 

(2) A constraint is imposed on the search for the optimal wavelet packet 
basis: Wavelet packet subspaces W spanning a Fourier frequency 
rectangle R [k^\ x [k^\ k^'^] where the deconvolution filter 
u~^{kx,ky) ’’varies too much” are marked as not selectable. We 
used here the restriction 

^ ^ ^242) 

^{kxiky) 

with Q = 16, this value on Q corresponds to the variation factor 
of the kernel cos^{kxTi/\fN) cos^{ky it/'/N) (used in [KMR03]) in¬ 
side the different subspaces of the mirror wavelet basis. It is easy 
to realize that there exists wavelet packet bases B of which the 
subspaces W satisfies the constraint (242) because each wavelet 
packet basis element has an essential frequency support inside a 
frequency box x Rjy,my C / [-T,7r] x [-7r,7r] with 

Hpf • • • • 

= [-Tr2^'"{mx + 1), -TT2^^mx] U [7r2'’^m2,,7r2'’“^(ma; -b 1)], 

- log2n < jx,jy <0, 0 < ma; < 2~^^, 0 < ruy < 2~^y (243) 


and there exists an injection from the collection of different tilings 
of the frequency square I by elements Rj^,m^ x Rjy,my into the col¬ 
lection of different discrete wavelet packet bases on [Wic94], 

[Mal98b]. 

(3) A diagonal estimate K* of the covariance matrix of the decon¬ 
volved noise represented in the selected wavelet packet basis B* is 
computed similarly to the case of the mirror wavelet basis shown 
above. We note that since the blurring kernels used in the model 
of the degradation process are separable, the required numbers 
rDfc), where K is the discrete Fourier representation of the 
covariance K of the deconvolved noise and is the discrete Fourier 
transform of a wavelet packet basis element may be computed 
fast with 0(\/iV) operations for each k. 

We compared the INMDL-deconvolution in the adapted basis defined 
above to the performance of the Wiener filter Ra{kx, ky) 


Ra{kx,ky) = 


1 


1 “b rr 


(T'^\U^{k^)\--^\Uy{ky 

\Pe{k.,ky)\ 


(244) 


where Ux and Uy denote the Fourier transforms of the convolution filters 
Ux and Uy applied along rows and colums of the image, respectively in the 
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degradation process (216), Pq denotes the power spectrum of the unknown 
signal 6 in (216) and 0 < a < oo is a regularization parameter. We applied 
the iterative algorithm given in [CH91] to estimate Pq. The results on test 
images are shown in Figure 3 and Figure 4. We also tried the INMDL- 
deconvolution algorithm on a high resolution optical gray level image taken 
by satellite Ikonos, this is shown in the test image Lillesand in Figure 4. 
The bitdepth of the image is 11, and the pixel resolution is 1 meter. 


3.1. Discussion of experimental resnlts. The results shown in Fig¬ 
ure 1 and Figure 2 show that the INMDL-based restoration algorithm per¬ 
forms slightly better than the MWT-method. However, the blurring of 
the images imposed by the kernels cos^*' ^kyir/y/iv'^ is not 

very hard as may be seen from the Figure 1 and Figure 2. Also, much 
better restoration results using a MWT-method are reported in [KM03], 
[KMR03], but this difference from our reported results is likely due to a 
post-processing of the MWT-estimates by the ’’spin-cycling”-method [CD95] 
yielding a shift-invariant estimate. Unfortunately, we have not had the time 
to implement this important stage of the estimation process, but we have 
no reason to believe that the INMDL-based estimates would not benefit as 
much from this kind of posterior regularization techniques as is shown to 
be the case for the MWT-method in [KM03]. Therefore, the experimental 
results obtained here for the INMDL and MWT estimators, although not 
impressive in performance, we believe they may be used to compare the (po¬ 
tential) performance of the MWT and the INMDL-based estimators. Our 
conclusion is then that the INMDL principle offers an alternative deconvo¬ 
lution technique which compares favourably to the MWT method. 

In the case of using the INMDL-principle to restore images degraded 
by a box car filter, we had to use a basis which approximately diagonal¬ 
izes the covariance matrix K of the deconvolved noise. For this purpose 
we used a certain type of anisotropic discrete wavelet packet analysis for 
[Wic94], [Mal98b] together with the best basis algorithm [CW92] 
and some constraints on selectable wavelet packet bases as explained above. 
Figure 3 shows that both the Wiener and the INMDL-estimate visually suf¬ 
fers from the same kind of global ripple artifacts. The explanation for this 
in the case of the Wiener filter is of course that the each Fourier basis el¬ 
ement has a support equal the entire spatial (pixel) domain. In the case 
of the INMDL-estimate the explanation is that the method outlined above 
of not allowing wavelet packet bases with elements possessing a Fourier fre¬ 
quency support over which the deconvolution filters U~^{kx) or U~^{ky) 
are not ’’approximately constant”, favours the selection of wavelet packet 
bases with (at least some) basis elements of high frequency resolution (small 
frequency support) and thus these basis elements must have a large spatial 
support. 
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Finally, we note that [BCN99] has reported results obtained with a 
hybrid method where one first preprocess the deconvolved data with a col¬ 
lection of adaptive Wiener filters {Ra.}j^j, 0 < aj < 1 in the Fourier 
domain, and then one estimates the signal in the wavelet domain from the 
Fourier-regularized data by the universial thresholding scheme [DJ94]. The 
constructed hybrid estimator is shown in the cited paper to outperform the 
ordinary Wiener filter in experiments. Thus, one possible approach to 
improving the performance of the INMDL-deconvolution principle as de¬ 
fined and tested above, would be to apply some kind of regularization in 
the Fourier domain (or possibly in a suitable wavelet packet domain) to 
the deconvolved data, and then denoise the deconvolved data in a suitable 
wavelet/wavelet packet basis. To incorporate such a regularization in a 
codelength principle is a topic for future research. 
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£NR - 22.6 cB, R- • 1 .1% SNR - 21«B, R - 16.2% 



Figure 1. Comparison of estimator performance for IN- 
MDL and the MWT algorithms on 8-bit grayscale test im¬ 
age barbara with N = 512 x 512. The noise variance is 

1 1 1 /~\TT T ir\ -i- r-\ /-li^ /tt* <-» /-I F V» i ■i'>n <-» rc/-^ 
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(a) Original cameraman, TV = 9.34x10'* 


(b) Degraded cameraman, 

TV = 5.00x10^ SNR = 21.0dB 




(c) INMDL-estimated cameraman, 

TV = 7.47x10"*, SNR = 22.8 dB, R= 11.1% 


(d) MWT-estimated cameraman, 

TV = 6.53x10*, SNR = 22.0 dB, R = 15.6% 




Figure 2. Comparison of estimator performance for IN- 
MDL and the MWT algorithms on 8-bit grayscale test im¬ 
age cameraman with N = 256 x 256. The noise variance 
is = 1.0 and the lowpass filter used to degrade the im¬ 


age was: 


cos 


:PX 


ikxTr/vNj cos^y (kyTr/\/N 


in the Fourier 


domain with Px = Py = 3.0. TV denotes the total variation 
(240) and R is the fraction of nonzero wavelet coefficients in 
the estimate. The GGD shape parameter was estimated by 
the INMDL-algorithm to: u = 0.556. 
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(a) Original cameraman, TV = 9.34x10^ 


(b) Degraded cameraman, TV = 2.90x10^ 
SNR = 15.2 dB 




(c) Wiener-estimated cameraman, 
TV = 9.33x10", SNR = 19.3 dB 


(d) INMDL-estimated cameraman, 

TV = 7.42x10", SNR = 18.2 dB, R = 4.8% 




Figure 3. Comparison of estimator performance for IN- 
MDL and the Wiener filter algorithms on 8-bit grayscale test 
image cameraman with N = 256 x 256. The noise variance 
is = 1.0 and the lowpass filter used to degrade the image 
was a 9 X 9 box car filter. TV denotes the total variation 
(240) and R is the fraction of nonzero wavelet coefficients 
in the estimate. The GGD shape parameter was estimated 
by the INMDL-algorithm to; u = 0.667. The Wiener filter 
regularization parameter used was a = 1.0. 
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!d> IKMDL-^itn^d L Hasand liiidja. 
TV-4.26Kl0P,SWR-22.6d^ R-6.9% 



Figure 4. Comparison of estimator performance for IN- 
MDL and the Wiener filter algorithms on 8-bit grayscale test 
image Lillesand with N = 512 x 512. The noise variance is 

^7- 1 - 1 O n VI n -("Uw IwVTTWnCid ni + WV* inC/vn 4"-(■V.W 1 wi n .V/V 








Bibliography 


[AS70] 

[Bal96] 

[Bal97] 

[BCN99] 

[BG95a] 


[BG95b] 

[BG95c] 

[BRY98] 

[GD95] 

[GH91] 

[GRM98] 


[GT91] 

[GVOO] 

[GW92] 

[Dau92] 


M. Abramowitz and LA. Stegun, Handbook of mathematical functions, Dover, 
1970. 

V. Balasubramanian, A geometric formulation of Occam’s razor for in¬ 
ference of parametric distributions, Tech, report, Princeton Univer¬ 
sity, Jan. 1996, Princeton preprint PUPT-1588. Available online at: 
http: //schwinger .harvard, edu/~vijayb/. 

_, Statistical inference, Occam’s razor, and statistical mechanics on the 

space of probability distributions, Neural Computation 9 (1997), no. 2, 349-368, 
Available online at: http://schwinger.harvard.edu/~vijayb/. 

R.G. Baraniuk, H. Choi, and R. Neelmani, Wavelet-domain regularized decon¬ 
volution for ill-conditioned systems, IEEE Image Processing, 1999. ICIP 99. 
Proceedings. 1999 International Conference on 1 (1999), 204-208, Available 
online at: http://citeseer.nj.nec.com. 

A. Bruce and H. Gao, Understanding waveshrink: Variance and bias estimation, 
Tech, report, StatSci Division of MathSoft Inc., 1995, In preparation. Bruce, A. 
G. and Gao, H.-Y. (1995b). Understanding WaveShrink: Variance and Bias 
Estimation. Technical report, StatSci Division, MathSoft, Inc., 1700 Westlake 
Ave. N, Seattle, WA 98109-9891. 

_, Waveshrink: Shrinkage functions and thresholds. Tech, report, StatSci 

Division, MathSoft, Inc., 1995, Proc. SPIE, San Diego, CA, 1995. 

_, Waveshrink with semisoft shrinkage, Tech, report, StatSci Division of 

MathSoft Inc., 1995, Bruce, A. and Gao, H., WaveShrink with Semisoft Shrink¬ 
age. StaSci Research Report No. 39 (1995) . 

A. Barron, J. Rissanen, and Bin Yu, The minimum description length principle 
in coding and modeling, IEEE Transactions on Information Theory 44 (1998), 
no. 6, 2743-2760. 

R.R. Coifman and D.L. Donoho, Translation-invariant denoising, Tech, re¬ 
port, Stanford University and Yale University, 1995, Available online at: 
www-stat.Stanford.edu/~donoho/reports.html. 

R. T. Chin and A. D. Hillery, Iterative wiener filters for image restoration, IEEE 
Transactions on Signal Processing 39 (1991), no. 8, 1892-1899. 

I. Cohen, S. Raz, and D. Malah, Mdl-based translation-invariant de¬ 
noising and robust time-frequency representations, Proc. of the 4th 
lEEE-SP Int.Symposium on Time-Frequency and Time-Scale Analysis, 
Pittsburgh, Pennsylvania, 6-9 Oct. 1998. (1998), Available online at: 
http://citeseer.nj.nec.com/cohen98mdlbased.html. 

T.M. Cover and J.A. Thomas, Elements of information theory, Wiley, 1991. 
Bin Yu Chang, S.G. and M. Vetterli, Adaptive wavelet thresholding for image 
denoising and compression, IEEE Transactions on Image Processing 9 (2000), 
no. 9, 1532-1547. 

R.R. Coifman and M.V. Wickerhauser, Entropy-based algorithms for best basis 
selection, IEEE Transactions on Information Theory 38 (1992), no. 2, 713-718. 
I. Daubechies, Ten lectures on wavelets, Siam, 1992. 


93 



94 


BIBLIOGRAPHY 


[DJ94] 

[DJ95] 

[DJ98] 

[DV02] 

[GROO] 

[HYOO] 

[KM03] 

[KMR03] 

[KTK88] 

[LanOl] 

[Mal98a] 

[Mal98b] 

[ML99] 

[OB94a] 

[OB94b] 

[OH94] 


[Ris96] 

[Ris98] 

[RisOO] 

[RisOl] 

[Sai94] 

[SteSl] 


D.L. Donoho and I.M. Johnstone, Ideal spatial adaptation by wavelet shrinkage, 
Biometrika 81 (1994), no. 3, 425-455. 

_, Adapting to unknown smoothness via wavelet shrinkage, Journal of the 

American Statistical Association 90 (1995), no. 432, 1200-1224. 

_, Minimax estimation via wavelet shrinkage. Annals of Statistics 26 

(1998), no. 3, 879-921. 

M.N. Do and M. Vetterli, Wavelet-based texture retrieval using generalized 
gaussian density and kullback-leibler distance, IEEE Transactions on Image Pro¬ 
cessing 11 (2002), 146-158. 

I. S. Gradshsteyn and I.M. Ryzhic, Table of integrals, series, and products, sixth 
edition. Academic Press, 2000. 

M. Hansen and Bin Yu, Wavelet thresholding via mdl for natural images, IEEE 
Transactions on Information Theory 46 (2000), no. 5, 1778-1788. 

J. Kalifa and S. Mallat, Thresholding estimators for linear inverse problems and 
deconvolutions. Annals of Statistics 31 (2003), no. 1, 58-109, Available online 
at: http://projecteuclid.org/. 

J. Kalifa, S. Mallat, and B. Rouge, Deconvolution by thresholding in mirror 
wavelet bases, IEEE Transactions on Image Processing 12 (2003), no. 4, 446- 
457, Available online at: www.cs.nyu.edu/cs/faculty/mallat/biblio.html. 

R. E. Kass, L. Tierney, and J.B Kadane, Asymptotics in bayesian computation, 
Bayesian Statistics 3, vol. 3, Oxford University Press, 1988, pp. 261-278. 

A.D. Lanterman, Schwarz, Wallace, and rissanen: Intertwining themes in theo¬ 
ries of model order estimation. International Statistical Review 69 (2001), no. 2, 
185-212. 

S. Mallat, Applied mathematics meets signal processing, 1998, Available online 
at: WWW. cs .nyu. edu/cs/f aculty/mallat/biblio .html. 

S. Mallat, A wavelet tour of signal processing. Academic Press, 1998. 

P. Moulin and J. Liu, Analysis of multiresolution image denoising schemes using 
generalized gaussian and complexity priors, IEEE Transactions on Information 
Theory 45 (1999), no. 3, 909-919. 

J.J. Oliver and R. Baxter, Mdl and mml: Similarities and differences. Tech, 
report. Department of Gomputer Science, Monash University, 1994, Available 
online at: http://citeseer.nj.nec.com/cs. 

_, Mml and bayesianism: Similarities and differences. Tech, report, De¬ 
partment of Computer Science, Monash University, 1994, Available online at: 
http://citeseer.nj.nec.com/cs. 

J.J. Oliver and D. Hand, Introduction to minimum encoding inference. Tech, 
report. Department of Computer Science, Monash University, 1994, Available 
online at: http://citeseer.nj.nec.com/cs. 

J. Rissanen, Fisher information and stochastic complexity, IEEE Transactions 
on Information Theory 42 (1996), no. 1, 40-47. 

_, Stochastic complexity in statistical inquiry. World Scientific, 1998. 

_, Mdl denoisinq, IEEE Transactions on Information Theory 46 (2000), 

no. 7, 2537-2543. 

_, Strong optimality of the normalized ml models as universal codes and 

information in data, IEEE Transactions on Information Theory 47 (2001), no. 5, 
1712-1717. 

N. Saito, Local feature extraction and its applications using a library of bases, 
Ph.D. thesis, Yale University, Department of Mathematics, 10 Hillhouse Avenue, 
P.O. Box 208283 New Haven, CT 06520-8283, december 1994, Available online 
at: http: //www.math. yale. edu/pub/papers/. 

C.M. Stein, Estimation of the mean of a multivariate normal distribution. An¬ 
nals of Statistics 9 (1981), no. 6, 1135-1151. 



BIBLIOGRAPHY 


95 


[TK86] 

[TKK89] 

[Vid98] 

[WF87] 

[Wic94] 


L. Tierney and J.B Kadane, Accurate approximations for posterior moments and 
marginal densities, Journal of the American Statistical Association 81 (1986), 
82-86. 

L. Tierney, R.E. Kass, and J.B. Kadane, Fully exponential laplaee approxima¬ 
tions for expeetations and variances of nonpositive funetions, Journal of the 
American Statistical Association 84 (1989), 710-716. 

B. Vidakovic, Nonlinear wavelet shrinkage with bayes rules and bayes factors, 
Journal of the American Statistical Association 93 (1998), 173-179. 

C. S. Wallace and D.M. Freeman, Estimation and inference by eompact coding. 
Computer Journal 11 (1987), 185-194. 

M. L. Wickerhauser, Adapted wavelet analysis from theory to software, A K Pe¬ 
ters, 1994. 




APPENDIX A 


Notation and definitions 


(1) Let ^ be a set, then denotes the collection of all strings with n 
elements taken from A. 

(2) [x]u is the physical dimension unit of the real variable x, that is; 
[2 meter] „ = meter . 

(3) [x\v is the number of the real variable x, that is; [2 meter]„ = 2. 

(4) logx is the natural logarithm of x, that is; x = exp (logx), Vx G 


(5) log„ X is the logarithm of x in base a, that is; x = Vx G M+, 

a > 1. 

(6) For X G and 1 < p < oo, define the norm ||a;||p by; ||a;||p = 

(El. 

(7) For countable sequences {afc}^_^ of real or complex numbers, 
define; ||afc||p 

(8) For functions / ; —> C and 1 < p < oo define the norm 

by: ll/llp (LgKn \f{x)\P dxY^^. 

(9) Define {{afc}^_oc. : Ikfellp < oo}. 


(10) Define LP(M”) = {/ ; —> C ; ||/||p < oo}. 

(11) Let C'^(M”) denote the space under addition of functions f{x) : 

—> C with k continuous derivatives. 

(12) For functions / ; —> C such that / G L^(M”) define the 

Fourier transform T ; L^{W^) —> (^(M”) by; /(^) 

(27r)-V2 f(x) exp(-ix ■ ^) dx, $ G M”. 

(13) For X G define the discrete Fourier transform T{{xm}m=i) 

by; x[k] J^({xm}m=i)[^] N-^/‘^J2m=i^rnexp{-27rimk/N), 

—N/2 < k < N/2. This definition is extended to countable se¬ 
quences {am} G A by; d{aj) '^= T{{am}m=-oo){‘^) 

'= (27r)-i/2 Y}m=-oo amexp{-imuj), u) G [-7r,7r). 

(14) For functions / ; M” —> C such that / G L^(R"') define the 

Fourier transform T ; L^(M”') —> C'(M"') by; /(^) •7^(/)(^) 

( 27 r)-i /2 exp{-ix ■ ^) dx, \ G M*". 
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A. NOTATION AND DEFINITIONS 


(15) For column vectors x,y £ , define the inner product (•,•) : 

X —>C by: {x, y) '^=y'^x = YJk=i ^kVk- 

(16) For sequences {ak},{bk} G define the inner product (•,•) : x 

-> C by: {ttk, bk) Yjn=-oo ^nbn- 

(17) For functions f,g^ L^(M"') define the inner product (•, •) : L^(M"') x 

L2(M”) —> C by: {f,g) f{x)g{x) dx. 

(18) For sequences {afc},{6fc} G define the convolution operator * : 

t X by: (a * b)k E“=-oo (^nbk-n- 

(19) For functions f,g ^ L^(R"') define the convolution operator * : 

LHM")xL1(M") ^ LHR-) by: f*gix) = 4^^. f{y)g{x-y) dy. 

(20) an = 0{bn) implies the existence of a constant A > Q such that 
^ < ^,V n > 1. 

On 

(21) a„ = o{bn) implies that lim^^oo ^ = 0. 

(22) Pq is the gaussian distribution function: 

f-oc exp dx. 

(23) erf is the normal error function: erf(x) exp (— dx. 

(24) F(x) exp(—t) dt, x > 0, is the gamma-function. 

(25) On is the set of all orthogonal real n x n matrices. 

(26) log* n log c -|- log n + log log n + log log log n + ■ ■ ■ for n G N 
where the sum includes all positive iterates and c ~ 2.865 is a 
normalization constant. 

(27) Let I CR denote an interval, then /^ = {x = (xi, X 2 ,..., x^)'^ G 

: Xj G /}. 



APPENDIX B 


The mirror wavelet basis 


The degradation process of data 6 is modelled as 

y = u * 6 + 7] (245) 

where u is a known lowpass filter and rj is IID gaussian noise and * denotes 
the convolution operator. Let U G denote the discretized circular 

convolution operator representing the smoothing degrading on the data 9 
by the lowpass filter u 

y = U6 + rj (246) 

where U is the smoothing matrix representing the smoothing operation per¬ 
formed by lowpass filter u, 6 is the parameters we want to estimate and t] 
is white gaussian noise. After deconvolving with inverse operator U~^ we 
have 

X = U-^y = 6 + U-^rj (247) 


Now, the noise Z = U is non-white gaussian with covariance K 

K . (248) 

Circular convolution operators U are diagonal in the discrete Fourier basis 
Wp = [bk]^~Q where 5^ are column vectors with 

bk[n\ ^ exp , 0 < n < iV (249) 


It follows from this fact and (248) that the eigenvalues of K are given by 

al {Kbk, bk) = {W^KWpW^bk, W^bk) 

= {W^a^U-^U-^WFW^bk,W^bk) 

= {W^U-^WpW^U-^WpW^bk, W^bk) 

= a^(^{W^UWp){W^UW fY^ W^bk.W^bk) 


a 


\u[kW 


(250) 


Now, (249) shows that the Fourier basis elements have full support in the 
space domain and therefore this basis, while providing a domain where the 
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B. THE MIRROR WAVELET BASIS 


noise Z ~ AA ^0, diag ^ suitable for estimating Q 

in (216). The main idea in [KMR03], [KM03] is to construct a wavelet 
packet basis W = of where tpi[n] are supported in the space 

domain 0 < n < — 1, which approximately diagonalizes the covariance K. 

Assuming the sample space dimension A^ is a power of 2, we define 



Figure 1. Illustration copied from [KM03] of the mirror 
wavelet decomposition algorithm and the Fourier support of 
the mirror wavelet basis. Each branch in the decomposition 
tree represents a convolution with a filter h or g followed by 
decimation by 2. The fourier frequency index k is plotted on 
the first axis. The curved fat line growing from left to right 
shows the diagonal covariance matrix of the noise in the 
Fourier domain plotted as a function of the fourier frequency. 
The noise variance varies by a bounded factor which do 
not grow with N, on the frequency support of each mirror 
wavelet There is a critical frequency kc above which the 
noise variance is too high for reconstruction to be possible. 














B. THE MIRROR WAVELET BASIS 


101 


Given a conjugate pair of mirror filters h, g, iV-periodic discrete mirror 
wavelets 'ipj kiiT-] are defined from orignal A^-periodic discrete wavelets 'ipj kin] 
by 

4fc[n]"=^'(-ir-V,>[l-n] (252) 

where 

if^jin] ^ g[2^~^~^n]^j-i[n] and 4>j[n] h[2^n], L < j < 1 (253) 

/=o 

and (j)L[n] = Vn G 0,1, 2,— 1, and 

V;j.fc[n] = 4>j[n -N2^k], 0<k< 2“^ (254) 

and we also define 

iAi,oN = ^i,o[n] = Vn G 0,1, 2,A^ - 1. (255) 


We note that the Fourier support, supp ^l^j[n], satisfies 

supp 'ipj ^ [2“-^“^, 2“'^]. (256) 

The Fourier transform of the mirror wavelets by definition satisfies 


IV’i,fcNI = - n]| 

and by (256) we have 


(257) 


supp ^ [N/2 - 2-\Nl2 - 2“^“^]. 


(258) 


The mirror wavelet coefficients {f,'ipj,k), 1 > j > L + 1 are calculated from 
the finest scale wavelet coefficients (/, ^pL+l^k) by a cascade of convolutions 
and decimations by 2 with the pair of conjugate filters h, g as illustrated in 
Figure 1. (see appendix). Defining the discrete A^-periodic mirror wavelet 
basis W by 


W 




0<A:<2-J,L+l<j<l) 


['^j,k]o<k<2- 




(259) 


where 


=^iAj,fc[n], 'il^j,k[n\=^'ipj,k[n\. (260) 

we have by general properties of wavelet packets [Mal98b] that W is an 
orthonormal basis for Furthermore, it is proved in [KM03] that the 
covariance matrix K of the noise defined in (248) is nearly diagonalized in 
the mirror wavelet basis W for all N if the wavelet ijj has q > p vanishing 
moments where p is the order of the zero of the lowpass smoothing filter u[k] 
at the highest Fourier frequency k = ±A^/2. In [KMR03] one considers 
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B. THE MIRROR WAVELET BASIS 


smoothing filters u which have a Fourier transform u with a zero of order 
p > 1 at highest Fourier frequency k = zizN/2, that is 


u[k] 


2\k\ 


N 


we will here use the smoothing filter 

u[k] = cos^ 


- 1 


71/C 

Iv 


p>l 


, P > 1 


(261) 


(262) 


which have the type of smoothing behaviour that the mirror wavelet basis 
is designed to work with. We define the pseudo-inverse smoothing filter u~^ 


by 

^^[k] - 

Define 

xpikj'^^U^ 

xf\k]^^U^ 


del 


0 , 


if u[k] / 0 
if u[k] = 0 




del 




(263) 

(264) 

(265) 


and observe by (247) and (248) that the data Xj[k], Xj[k],0 < k < 

2~^,L + 2 < j < 1 are gaussian random variables with means 

0^^\k] and variances {Ktpj^k,'4>j,k) and {K'ipj^k,'ti’j,k), re¬ 

spectively. We then have by (250), (256) and (261) 




IV/2-1 


n=—N(2 


|u[n]P 


fj 


(266) 


Furthermore by (258), (261) we have 

-2 del 


Ar/2-l 


1^7 N I' 


n=-N/2 I ^ 

(267) 

We note that the noise variances (t|^, do not depend on the translation 
index k of the wavelets '^j,k- Using thresholding estimators as described 
in [DJ94], [BG95b] the estimators d^^\k], 6^^\k], 0 < k < 2~^ are 
given by a thresholding scheme on the wavelet expansions x^^^ of 

the deconvolved data x. The thresholds T, Tj used in [KM03] on x^^\k], 
x^^\k], respectively, are the ideal thresholds described in [DJ94] 


T *^=1^17^2 log, (iV/2) 


(268) 
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a^V^loge 

oo, 


if o-jV21oge(2 ^■) < % =%upjge |(/,V’i,fc)| 

otherwise. 

(269) 


that is the same constant threshold T d^ned above is used on all of the 
N/2 low frequency wavelet coefficients < k < 2~^, L + 2 < j < 1, 

and the threshold Tj is used on the high frequency mirror wavelet coef¬ 
ficients x^^\k],0 < k < 2~^, L + 2 < j < 1, where the noise variance 


E ^[k]j may be approximated by aj defined in (267) on 

each subband; spanQ<^^ 2 -i' 0 i,fc- We note that the hard thresholding func¬ 
tion is the MAP-estimator for GGDjy-distributed with u = 1, i.e 

Laplace-distributed. 


)iw). 


There are some remarks which should be made on the mirror wavelet 
deconvolution algorithm as presented above. The set 0 over which the 
numbers sj defined in (269) are computed, is in [KM03] taken to be the set 
0tv of bounded discrete total variation: 

0tv : ll^lltv J] Mn] - 0[n - 1]| < c| (270) 

I n=0 J 


where C > 0 is some universial constant. Then it is shown in [KM03] that 




(271) 


The critical scale 2'^ is defined as the smallest scale such that Tj = oo for all 
scales 2^ with 2^ > 2“^. The mirror wavelets V’c,fc on the critical scale have 

a Fourier transform ^ whose support is essentially at Fourier frequencies 

def 

\k\ > kc = iV/2 — 2“"^, this is illustrated in Figure 1. This implies the 
existence of a cutoff Fourier frequency kc for thresholding estimators and 
so we can replace the pseudo inverse smoothing filter u~^ in (263) by a 
truncated pseudo inverse u~^ defined by 


def 


if \k\ < kc 

0, otherwise 


(272) 


Also, in the numerical experiments in [KMR03], [KM03] one uses the 
translation invariant thresholding algorithm [CD95], however we have not 
had the time to implement this algorithm, and so we stick to the ordinary 
thresholding algorithm in our numerical experiments in this thesis. 

The restoration algorithm may then be summed up as follows: 

(1) Estimate the variance E{'rf‘) of the white gaussian noise tj in (216). 
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(2) Decide on the order p of the smoothing filter u[k] = cos^{Trk/N) 

in (216) and on the numbers Sj supjg 0 \ {f,'ipj^k)\, alternatively 
decide on a critical frequency kc- 

(3) Expand the given data y into the Fourier basis and deconvolve 

def 

the transformed data y in the Fourier domain by computing x = 
y[k] ■ u~^[k], 0 < k < N where u~^ is the truncated pseudo in¬ 
verse smoothing filter defined in (272) and apply the inverse Fourier 

transform on the result to yield x J-~^{x). 

(4) Calculate the variances a^, a'j, L + 2<j<lof the mirror wavelet 

basis expansion coefficients W'^Z of the deconvolved noise Z 
u~^ * rj, or their approximations in (266), (267). 

(5) Expand the deconvolved data x into the mirror wavelet basis W and 

apply the thresholding operation T with thresholds T, Tj in (268), 
(269) on the transformed data W'^x in the mirror wavelet 

domain and apply the inverse mirror wavelet transformation W on 
the thresholded transformed data T(lF^x) to find the parameter 
estimate: 6 = WT{W'^x). 

The deconvolution estimator described above for signals have a separable 
extension to image data. The smoothing filter u in (216) is here a separable 
lowpass filter 

u[ni,n2] = Ui[ni]u2[n2]i Q<ni<N, 0 < n2 < N (273) 

with Fourier transforms ui[ki] and U 2 [A: 2 ] as in (261). The deconvolved 
noise has a covariance K which is diagonalized in a two-dimesional discrete 
Fourier basis Wp^F = [^^fci,fc 2 ]o<fci,A: 2 <A'' and it follows as in (250) that the 
eigenvalues of K are 

= {Kbk,,k„bk,,k2) = \^[k^]\2\^[k2]\2 

2\k2\ 

N 

A separable discrete mirror wavelet basis of is constructed from the 

one-dimesional discrete wavelets and scaling functions (l)j, L < j < 1, 0 < 
k < 2-^ by 

'ipf\ni,n2] <f)j[ni]ipj[n2], 'tpf\ni,n2] • 0 j[ni](/)j[n 2 ], 

i/'®[ni,n2] Tpj[ni]Tpj[n2], '0®[ni,n2] = N~^. (275) 

Defining the translates 

- 2^-^mi,n2 - 2 ^-^ 1712 ], a = 1,2,3, 



-2p2 

(274) 


(276) 
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then the family 








L<j <1,0<mi ,m2<2 


( 277 ) 


is an orthonormal basis of It follows from the definition (275) that 

the family Bq of lower frequency wavelets 


def 
Of) = 


{4' 


’ ^J,7ni,m2 ’ 




L+l<j'<l,0<mi,m2<2 


(278) 


have Fourier transforms which are essentially supported in the low frequency 
square [—N/4, N/4]'^ where the eigenvalues of K are constant to within 
a universial constant factor, and therefore the elements of B are approximate 
eigenvectors of K, whereas the family Bi of higher frequency wavelets 


Bi = B\Bo = 




( 1 ) 

L+l,mi ,m2 ’ 




( 2 ) 

L+l,mi ,m2 ’ 



0 <mi,m2<N/2 


(279) 


are not approximate eigenvectors of K and these are replaced by the familiy 
Bi of separable mirror wavelets defined by 


^ def 
Ol = 


[ni]V^j2 

,m2 [»2|} 


L < ji,j2 < l,(ii,j2) / (L + l,-L + l), 

0 < mi < 2“-^i, 0 < m2 < 

(280) 


with 'ipj as defined in (252). It follows that the family 


B Bo U Bi 


(281) 


is a discrete separable anisotropic wavelet packet basis for of approx¬ 

imate eigenvectors of K. The tiling of the Fourier frequency plane that 
results from the separable mirror wavelet basis B dehned in (281) is illus¬ 
trated in Figure 2. Like in (266), (267) one has 

2 <^f / \ = / 

j^Oi \ ^ m2'’ ^ J,mi,m2 / \ 


= a 




m ,n2=—N/2 


^jlj2 — (-^V’ii,mi'0i2i'^2’^’^ 2 ,^ 712 ) — {^'^ji,rai'4^j2,m2j'4^ji,mi'4^j2,m2) 


|lT[ni]|2|uJ[n2]|2 


fj^, a = 1,2, 3. 


(282) 


= a 


E 


m ,712=—N/2 


|lT[ni]P|u^[n2]P 


~ (J 


222pi(ii-.J^)22p2(i2-i) 


(283) 
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Figure 2. Illustration copied from [KM03] of the separa¬ 
ble mirror wavelet basis for functions of two variables and its 
Fourier support properties. Note that the mirror wavelet ba¬ 
sis segments the frequency plane (/ci, /C 2 ) into rectangles over 
which the noise variance varies by a bounded 

factor which do not grow with N. The gray rectangles cor¬ 
respond to the critical scales beyond which the thresholding 
sets all coefficients to zero. 


Using the same ideal thresholds as in (268), (269), we define the thresholds 

2^(2) T(2) 1 


r(2) uV21oge (iV2/4) 


f. . / dji22\/21ogg (2-T-i2)^ if djij2\/21oge(2 n 12 ) < 

1 00 , otherwise. 


( 285 ) 
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where 


def 

^jl,j2 ~ sup 

/ee 


f, V’n,miV’ 


Jl,mi H^j2^'tTL2 


(286) 


Critical scales 2 '^^, 2^'^ are defined by; For each scale ji, define 2^^ as the 
smallest scale such that 2 ^^ > 2 ^^ implies = oo, and for each scale j 2 

define 2 ^^ as the smallest scale such that 2 ^^ > 2 ^^ implies ^Jl ,72 = oo. These 
critical scales are illustrated in Figure 2. Critical frequencies /Cci, may 
then be deduced as in the one-dimensional case by kc^ = iV/2 —2“*^% i = 1, 2 
and so may truncated smoothing filters ui, u^- 




APPENDIX C 


Calculation of Fisher matrix for the likelihood 

function 


We have from (38) and (39) the definitions 

n 

f{x\e,T) = (i) = exp (-^Ikxlp) exp - 9||=) ^ 

r = V’(f), V'(O) = To, di = ())(6»i,f) = 1 < ^ 

Define the reparameterized log-likelihood function L as 


L(®, f, e) log f{x\^{f), ^(0)) 

= ^logV’(f) - ^V’(T)||®±f - ^V'(f) ®|| -f^(f)“^/^0 


We compute the required partial derivatives of L and get 


= ipir) [®||(A:) — T^'ip{fy 


T2'0(f) 


= —T 


dL{f,e) 
d6k 

d‘^L{f,e) 

eel 

a^Uj.e) ^ Mil 

dOkdr 2 dr 

dL{f,0) nl2 dy{f) 1 dtpif) 2 ^ dy{f) 

- -Il^xll - o' 


- 1/2 


X 


\\(k) 


dr 


y{f) dr 2 df 


2 df 


X 



||— T2'0(f) 

..dy{f 


df 


1 52^(f) 

2 9f2 

1 d'>p{f) 

2 5f 


*1 


— T2'0(f) 


^ (a:||(i)-r 2 V'(f) ^/^0(i)) (^r 2 V>(f) ^/^0(i)^^^ 
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i=l ^ 


dr 


1 11 
i=l 

_i - ( 9 ^' 0 (f) 

+T2i/)(r) 20(^). 


_1 , .-_3 (9l/'(f) 
T2iP{t) 2 0 (^) 


3_1 / 5V’(f) 




We take the negative expectation —Ex of the data x = 
to the likelihood /. Using -Ea:[®|| — - 

{n — d)'4){f)~^ and -Ea:||®|| — = d'4){f) 


£c_i_ + a;|| with respect 
= 0 and Uajllaixlp = 
we get 


del 

c = —Ex 


bk = -Ex 


def 

a = —Ex 


d^L{T,e) 

dOl 

d‘^L{f,G) 

dOkdf 

d‘^L{f,e) 

(9f2 


= r. 


= -x'rV’(^) 1 < A: < d. 

2 OT 


nl2 (^ 
'0(f)2 5f 






5f 


2 d 




2=1 


(287) 

(288) 

(289) 


We may now write the Fisher matrix F{6,f) of the reparameterized likeli¬ 
hood /(ai|i/)(f), cj){0, f)) as the (d -|- 1) x (d -|- 1) matrix 



i “ 

bi 

62 ... 

bd \ 


6 i 

Cl 

0 ... 

0 

F = 

62 

0 

C2 • • • 

0 


bd 

0 

0 ... 

Cd / 


(290) 


where the only nonzero elements of F are located on the first row and the 
first column and the diagonal. The determinant of the matrix in (290) is 
easily verified to be 


det F 





and since in this case Ci = c, 1 < i < d, we get 


(291) 


d 

det F = ac'^ 

2=1 


(292) 





C. CALCULATION OF FISHER MATRIX FOR THE LIKELIHOOD FUNCTION 111 


Plugging in (287), (288), (289) into (292) we get 


|F(0,r)|=f 

d 


( ^/2 f d'lpif) 


tP{t) 


1 


Sr , 


-2 f 

dr 


i=l 


— r 


d-l 


E 

1=1 


1 dTp(f) ^Y _ n/2 /^dTp(f)^ ^ 


2 7p(f) df 


V)(f) 


df 


(293) 


We finally verify that our calculated reparameterized Fisher matrix F sat¬ 
isfies the relation \F{6,f)\ = \ j'^F{6 ,t)J\ where J is the jacobi matrix 
induced by the transformations Oi ^ 4>{6i,f), 1 < i < d and f ^ '^{f). The 
Jacobian is 


( 

dipi^) 

diplj) 

8'ip{f) 


dipir) 

\ 

dr 

ddi 

802 


8Sd 

\ d4>{0i,T) 

d<l>{0i,T) 

84)01,7 

1 . 

84>{0i,t) 



df 

dSi 

802 


80d 


\ d(f){92,r) 

8(1)02,t) 

84>02,7 

1 

84)02,r) 



df 

801 

802 


80d 


d(t>{9df) 

84)0 d,r) 

84)0 d,7 

1 . 

84)0 d,r) 


\ 

df 

801 

802 


80d 

/ 


( fid 

0 0 

■■ 0 \ 






s 0 

•• 0 




= 

t2 

0 s 

•• 0 





V td 

0 0 

•• ^ / 





where U = —^r^dnedipif) 1 < i < d and s = T‘^'il){f) 2 . Since J 

is triangular matrix we have \ J\ = ip'{f)s'^. Now (38) yields after a trivial 
computation 

\F{e,T)\ = ^r^-\ (295) 

Thus we have 

Comparing (296) and (293) we see that |-F(0,f)| = | J^F'(0,r)J| is satisfied. 





APPENDIX D 


The Laplace approximation formula for the 

marginal 


Let Tm ’ {x, T, 6) denote the m’th degree Taylor polynomial expansion of 

^ as a function oi 9i, i = 1, ...,d about the points 0* and f*. Since the prior 
TT\{0) may not be smooth at 0 = 0 we will have to claim that 0* is nonzero. 
The 6 integration in (48) will have to be split up into the 2^ integration 
areas consisting of and the remaining 2^ — 1 ’’quadrants” which union 
is \ We assume that 6* G and as will become clear below, this 
assumption implies no loss of generality. Let ^ai,...as{x, f, 0) denote the s’th 
order partial derivative of 4* with respect to the ordered list of parameters 

ct = ai, ..., as- We write out the terms of t!^ ^(a;, f, 6) explicitely below. 

Define 


p(r\0*) 

-K 




ix,T,e) = - 


R 


{T*,e*) 

K 


ix,r,0) = Y, E 7 


^ f-ki df^def 

1 5^+^4>(£c,f,0*) 

jiki dfWe^ 


1= 
d oo 


*\k 




i=l j+k>K 


we may then write 

^{x,f,9) = f^{x,f*,e*) + R^f’^*\x,f,e) 


where 


ff’^*\x, r, 0 ) = Hx, r,0*) + ^,{x, f*,0*){T - f*) 

d ^ 

+ r,e*m - 01) + (®, R,e*){r - n 


2 = 1 


+ ? E L. A (“=. f t + E La 


2=1 


2=1 


0 *) 

(297) 


where we can safely omit cross derivative terms of type g., i ^ j be¬ 
cause of our IID modeling assumptions on the 9i and the functional relation 
9i = (p{9i,f). We note that the pure first order terms in 9i and f will vanish 
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because (ai, f*, 0*) = 0, 1 < i < d, and (®, f*, 0*) = 0 by definition 
of f* and 6*. We will approximate the innermost integral in (48) by com¬ 
pleting the squares in the parameters 9i in \x,f* ,6*) and integrate 

the resulting shifted quadratic exponential against a remainder polynomial 
P{f, 9) over the parameter manifold 0^. To prove that this method is sound, 

we have to show that the error due to the terms in \x, f, 0 ) not in¬ 

cluded in the remainder polynomial P(f, 0) can be made small enough. To 
do this we will have to analyse the relative magnitudes of the coefficients of 

to find the leading order terms. In order to explicitely express the 
dependency of terms of ^ on f and f we use equation (46) and the 

chain rule to rewrite partial derivatives of <I> with respect to the parameters 
0i and f as combinations of partial derivatives of 4>. By completing squares 
in f and 0i, 1 < i < d and omitting the zero first order terms we may rewrite 
T 2 as 




* \2 


(x,f,e) = <i>(x,f*,9*) + -^^^^{x,f*,e*){f-f*) 


2 = 1 


2 


Oi - e* + 


4>,_,;(*,f*,r)(f-f*)' 




Now we may write (48) on the form 

= h+h 

where 




(298) 


Ii r 2 exp ( —<h(ai, f*, 0’' 


df exp — 


' T^It 




i=l 


* \2 


(f-f*) 




exp 


4 E [*»„», 

2 = 1 


{x,f*,e* 


0i - 0 * + 


Y def = ^ 

h = T2 





exp ^(ai, f, 0)) dO. 


dO. 


(299) 

(300) 
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We begin with the calculation of the 0-part of the integral Ii. We will first 
introduce som notation and some claims. Define 

Ta( 0) - log[7rA(6»)]^ = - log C\^ exp ^ (301) 

where C > 0 is a normalization constant independent of A. We claim that 
f{9) is an integrable, symmetric, function of 0, such that 

lim /(A20 )=oo, (302) 

|0|—>oo 

and such that there exist numbers 0 < u < 2, < B^, G M, Ci/ > 0 with 

the properties 

B', < f{xh) <B, + C,\xh\’^,y [9], G M, (303) 


and 

,l<fe<oo. (304) 

Because of (304) we have 

[^Ta(0)1 < \c,u{u-l)X\X^9r^] ,V[0], gM. (305) 

L Ji, 

Define the signal to noise ratio (SNR) D by the power ratio in the data 
model (25) 


s del 

n-z 


(signal to noise ratio) 


(306) 


we may deduce from (305), (306) and the fact that the likelihood is gaussian, 
the inequalities 


<[T*]vU + ^n-\X,T*)\xh*\'^-^C^u{u-l)\ , ifl<u< 2 (307) 


>[T*]^(^l + ^n-\X,T*)\xh*\‘'-‘^C^u{u-l)^, if0<i/<l. (308) 

We define 

AiA,!/(D di) -n~^{X,T)\X^9i\'"~‘^Cuv{u - 1) 
n 

= QD(A,r)^ |r5 6'i|''“^C',,i/(i/- 1) 


(309) 

(310) 
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and we claim there exists a number 0 < ^ < 1 such that 

<1, l<i<d. 


( 311 ) 


We will investigate this claim further below. By (307) and (308) we see that 
if the SNR-value fl(A, r) is high enough and the relative model size ^ small 
enough, then the value of ,0*) may be approximated by [r*]^ 

for all practical purposes for the actual value of v under consideration and 
’’reasonable” \\^9*\. We will discuss this question at the end of the proof. 
Proceeding analogously to the steps above, one may show 


,G*) = [T*]laxAr\ei) 


(312) 


where 


o'A,i.(T,6'i) n 2 (A,r*)|A 2 6'i|'' ^(^,, 1 /( 1 /- l)(u - 2)sgn(6»i) (313) 

_^ 

= Qfl(A,r)^ |r5 6'i|''“^C',,i/(i/- l)(i/- 2)sgn (0j) (314) 


and 


where 

K\,iy(T,di) n~^{X,T)\X^6i\''~^C„iy{iy - l){u - 2){u - 3) 


n, 


' 2 I 1 . 


= (llfl(A,r)) ^ \T-^di\'' ^Cyv{v-l){v-2){v-3). 
\d 


We define 


AW) = ^U(e) 


and thus 




= T* + Aa(«*). 


e=e* 


By (304) we deduce 


We now continue with the calculation of the integral in (299). 
exp f, 6)^ = P(f, e) + E{f, 6) 


(315) 

(316) 

(317) 

(318) 

(319) 

(320) 


where 



D. THE LAPLACE APPROXIMATION FORMULA FOR THE MARGINAL 


117 


Define 


OO / I \ A’ « 

del (-1) / A(r*,0*) 


k=l 


kl 


r) / \ def 1 

Pg[x) = 


1 P 

J-c 




exp ( —-P ) dt. 


^ 7_oo ^V 2 

We may then proceed to write 


(322) 


(323) 


h = u[x,T*,e*) +w[x,r,e 

where 


U ix,f* ,0* ] r 2 exp 1—^{x,t* ,6^ 


(271)2 




»'TS/f 

d 


df exp I -- 


^,^,{x,f*,e*)-Y, 


^ ^l^ {x,f*,e*) 


llPG[^l^,{x,r,9*)[e* 


i=l 

and 


%,§Sx,f*,e*)(T-T*) \\ 


(f-f*) X 


(324) 


W ix,T* ,G* ] r 2 exp ( —<h(a;, t* , 6 *)) x 


' tG/t 


df exp I -- 


^,^,{x,f*,e*)-Y, 


^ ^2 (x,f*,e*) 

' wi 




(f-f*) X 


d 


le&K’l 


exp 


0i - 0* + 


V 2 = 1 




E{f,e) de. 


(325) 


We first investigate the term W f*, . The lowest order term of -E(f, 6 ) 

in (322) is \x,f, 6 ), and is given by 


R 3 \x,f,e) = -<l>f^r,T{x,f*,e*){f - f*)^+ 

D 

1 ^ 

+ 9 E a(®’ ) 


2=1 
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i=l 


1 ^ 

+ R order . 


i=l 


( 326 ) 


We will now make a few observations and claims which together will imply 
that it suffices to consider the part of E(f, 0) given by the third order terms 
listed in (326) to compute Vb(£c,f*,0*) to leading order. As will become 
clear from considerations below, the interval If will have to include the point 
f* in order to get convergence of the f—integration step. Furthermore, it 
will become clear that t — f* must be bounded below. In fact we will see 
below that we must have 

ir = [t* — ai-5~^e2^,f* + , 0 < Of <C 1, < bf < oo. (327) 


where Of and are to be chosen large enough to make the integral of 
exp ^—T 2 {x,f, 6 )^ converge with respect to the integration in f. Further¬ 
more, we will see below that we may choose Of = , thus making the interval 

Ir symmetric about f*. This fact will be used to simplify the computations 
below. Next, we observe 


^0 = T^Snedi-ixiiii) - 0 *){t*)^ 


^ di,f 

-i(T* + AA(«n)(T-)A()- 


xM) 


e. 


^ + ^(1 + MA,i.(r*,0*))^ T2 5ned{T*f^9*, 


-^(1 -IiA,!.('r*,6'*))] TU„erf(r*) 26 »* 


C 


by (470), (320), (311). 

^rA^^r\e*) = 5lel 


Tl - d 2^ 1||/ II 2 

-^- + ^\\{r*r^^f 2 + 


d 


+ + ^r*(a;||(i) - 9*)9* by (468). 


i=l 


i=l 


= (l + o(/iA,.(r*,0:)))r by (469), (319), (320). 
= by (473) and (312). 




n — d + 2 3 


+ ;il(r*)2r||2+ 


(328) 


(329) 


(330) 

(331) 
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i=i / 

+ €4o{\\{r*)h*\\l) by (474), (312)-(313). 


= 5lejfkr*40* |-i + 0*)) 

+^axAr\0*){T*)-^6*^ by (472), (312). 

0.(®,T*,0*) = -^5„edT(r*)-^$0.,0,,0.(a;,r*,r)0* 

= -^Snedfcrx,^{T*,e*){T*)h* by (471), (312). 


{x,f\e*) = 6y,ffixAr*,0*) 


T,T, 0 i, 0 i 


by (478), (312)-(319). 

= Anx^uA*,0*) by (476) and (316). 

We will begin with considering the terms in E{t,0) that are 
g, gXx,T*:^*) (^i - a) ■ We dehne 


J def 

LjA) = 


/0eK^ 


2 = 1 


\ - 9*+ 


+- 


X>xAx,f\e*){r-f*)\ \ 1 


(332) 


(333) 


(334) 


(335) 






(336) 


Let 


V ) , 

.... A., del 

nir ,9i) = 


^ejS^A*,d*) 


(337) 


and observe that by (327), (328) and (411) we have 


fA(r-)i9- + f-hr*)i (^x,(i) - l^fl*) a,(l +0 

> - rj{f - T*) 
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>r 26 »*(r*) 2 +r 2 (r*)26»*(5„ed^(l - C)('r - f*) 

> f-he*{T*)"2 A - i(l - C)ar\ , 0 < « 1, 0 < C < 1. 


(338) 


We write 
Lj{f) = 


(2vr)^ 






(21,)- 


1 


d 


li>. \ {x,r*,0*)zi=-ei+fi{f*,e*){i--T*) 


dz exp 


2 ^ ^ 
i=i / 


^,\ ix,f*,e*)zj - r,if*,9*){f - f*) 




(339) 


By (331) and (338) we may write 


Ljir) = 


(271)^ 




-^r2ax,uir*,d*)x 


(rY (*||0') - 


n Pg (r*)^0*(l + C) l + af(l + C)- ^ , 


(271) 


/-I . . , dzj expf-h] ] X 


l>:y*,f*,r)z| - 3^TK{x,r*,e*)z]r,{f*,e*){f - r*) 


+3i>r% ix,f*,9*)Sjr](f*,0*j)if - f*f - )(f - P) 


j\' ,''3 


(271) 






(340) 


i=l,i^jr' 



(^Cj(f-f*)exp (^-|C|(f-f*) 

+Pa{-C0 -r)))f,{f\e*){f-r) 


1 
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(a^,r,r)exp ( -^C]{f-f*) ] f]{f* ,e*){f - T*f 


-Pg (-c,(f - f*)) 0/)(f - rf} 

where 


def 


c, (f - f* ) = ^>1 ->, f*, 0* ) (-0; + fj - r*)] ■ 

We define 


iV(A,u, 7 d) = 5 -V 








^ def logiV(A,U,7d)y n.; Arf ^ N 

= i^v(777)“j .'><‘-<‘°eivc-.A,7j) 


and we observe 
If = 


/cflogiV(A,u,7rf)\2 

" - ' JV(A,., 7 .) ) • 


logiV(A,u, 7 d) V _1 

T + 1 -^- I 


N{X, V, -id) 


implying 
It C. f T* exp 


log^ jV(A,u,7d) 

^^(A,u,7d) 


T exp 


log^ iV(A, u, 7rf) 

^(A, u, 7d) 


(341) 


(342) 


(343) 


(344) 


(345) 


Now we need to evaluate the integral of Lj(f) exp(—l5^e^A^(A, u, 7 d)(f — 
f*)^) over the interval If. We claim that the PG'(-)-term in (340) above may 
be sufficiently accurately approximated by the constant Pg{{t*)^9*) over 
the interval If if N{\, u, 7 ^) is sufficiently large. We proceed to verify this 
claim by evaluating the term U{x,f* ,6*) in (324). By (328), (330), (39), 
(320) we have 


= {l+o{nxAT*^^i 


)))-2{{r*)^6*+ 



122 


D. THE LAPLACE APPROXIMATION FORMULA FOR THE MARGINAL 


- l + o(P..Ur%0t)) - >1 

x\\{i) 1 1 

- 1 + - + Aj(9*){t-)-' 

= (1 + (1 + , { 


l+o{llX,y{T*,e*)) 


x\\{i) 1 


= {T*)2eni + o{iixAr\o:w { 


x\\{i) 1 

- 1^+0 Mr * m 

We define 


1 + 


„ , , fcf logN(A,!y,7d) \ 2 

I JV(A,,.,7d) 

1 + o{^ix,u{r*^^*i)) 


(346) 


gi{5ned{f - f*)) Pg (1 + o (/ ia ,!/( t *, 6'*)))2 { 1 + 

Sned{f-T*) \x\\{i) 1 

Introducing the change of integration variable 
t =^5ned^^^(A,u,7rf)(f - f*) 


(347) 


(348) 


we may write the integral (324) as 


u x,r,o*) { 


r 2 exp (—4>(£c,f*,0’' 


, . d 

{2tt)2 


n^=l ^!jS^^r*,e* 

6 nedN^X,u,'yd) 

pNh[X,uGa)ap / IoAA t \ 

taylor-expanding gi to second order around t = 0 yields 


-1 

> X 


|^N^{X,UGd)a.f / I 

/ 1 dt exp --t^ ) X 

l-N^{X,UGdPf V ^ 
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n { ^ 


// /_it 


1 \N^(X,u,'rd)J ^2 


iV2(A,u,7rf) 2 N{X,u,-fd) 


where: -A ^2 (A, u, 7 ^) 0 ^ < Ct < ^^2 (A, u, 7 ^) 0 ^ 


n5'»(o) 


l-N^ (\,v,'yd)af 


dt exp --t X 


// /_it 


TT K ^_ gt(0) ^ 1 " \N^(X,u,'rd)J ^2 

*yi gi{0)N^{X,ix,ja) 2 ft(0)iV(A, u, 7^) 


rewriting using the chain-rule and the fundamental theorem of calculus we 
get 


llPG{{T*)-^0*il + oigxAr*,Om) \ 

f ((r*)i(®||(i)-i0*)) r 1 
1 + ^-r--^-^exp 


rN^ {X,i^,ya)af / 1 \ 

/ 1 dt exp X 

'-N^ {X,u,-yd)af \ ‘^ / 


-T--^-exp --T*{e*) {1 + o inxA^* At))) t+ 

NHX,u,jd) L 2 J 


((r*) 2 (a;||(z) - ^6*)'^ 
2iV(A,u,7rf) 


exp --r*{et) {l + o{nx,uA*At))) \^ + 


it / ^11 p _ 1 ' 
Ni{x,u,'rd) \ 

l + o(^A,tt(r*,6'*)) 


It is now easy to see from (349) that the first order terms in t and all terms 
of odd order in t will vanish in the integration. Using the choice of af in 
(343) and collecting second order terms we may finally write 

/ \ ^ 

U (x,f*,0*^ = exp <l(a;, f*, 0*)')--x 

^rAxA*A*)-Y^ -^ X 

i=l ) 
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U. 

V^HPg (1 + o 


i=l 


erf V h log iV(A, z/, 7 ^) - (27r) 2 


_1 / /Cf logiV(A,u, 7 d)\ 2 


(2vr)-^ ^ r*(a;||(i) - i0:)(a;||(j) - ^0*) 


1 


7 d) .5^1 exp [\ t * [(0*)2 + (0*)2j) 

+0 (^(iiV“t(A,u, 7 rf)^| , 1 < /cf < logiV(A, u, 7 d). 


When evaluating the integral 
1 


' t(zIt 


Lj(r)exp 


the integrand will include terms that are 
exp (-^SlejN{\, u, 7 d)(f -f* f - \c]{f - f*) 


which map to 
1 


(350) 


(351) 


(352) 


exp -exp -77 If 1 


2 M 7iV(A,u,7d) 


(353) 


def 

when changing variables t = ( 5 nerfA ^2 (A, u, 7 (^)(f — f*). By means of (342) 
and (338) we have the inequalities 

1 


exp \--t ) X 


exp 


--(i-c)V(0;)21 i + a,(i + c) 


{x\\{j) - 


21 


(T*)2e* 


<exp -72 exp -77 


2 ' V " 7^0, ^,7d) 


< exp ( --t ) X 


exp 

V t G 


-2(l + C)V(0;)2(l-m 


1-C 


-ar\/N{X,iy,-fd), ar\/N{X,ix,'yd) 


(354) 
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We observe by (343) that log A^(A, u, 'yd)/N{X, u, 7 ^). Therefore, 

if A^(A,u, 7 d) is ’’large enough” and C ’’near enough” zero, we may write 


exp --t -exp --C. e 


2 r- 1,-1 


2 n " V'iV(A,u,7.) 


~ exp ’ 

VtG -Of V '^d),ar\/N{\, u, 7 d) (355) 

where the ~ means ’’accurate enough” to leading order terms. Using the 
approximation (355) on (351) together with (341) we may get rid of the 
terms of odd order in (f — r*) by writing 


Lj{f) = 


, . d 

(27r) 2 




rfTA,.(r*,0])x 


LL F F 

J] Pg I (r*)^0:(l + C) I l + a^(l + C) 




(27r)-2exp(--r*(0p^l x 


(a:;||(j) - 


(r*)2 6'* 


(1 + o(C)) {2 + r*(6'*)^ + Ajd^elif - r*)^ + odd powers of (f - r*)} 
PSAjPc (^(1 + o{C)){{T*)h* - Ajdnedif - T*))^ (r - f*) (356) 

where 

7l/=l'(r*)^0; + a,(T*)^ (1 + C). (357) 

By proceeding similarly to the steps taken in (347)-(349) we may conclude 
that the integral over of Pg'(-)(t — f*) in (356) may for all practical 
purposes be bounded by Ai“^(A, u, 7rf)(T*)5 (£C|| (j) — ^6*)exp{—^T*{9*)‘^). 
Now, continuing from (341) and repeating the steps (351)-(355) we get 

Lj(T)exp (^-i(52e^iV(A,u,7rf)(f-f*)2^ df 


(27^)"^ay.(r^g;) 


- Y 

^ ^ f 33 T* 


n PG((r*)^0:(l + 


(27r)-2exp --r*(0))2 X 
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(1 + o(C)) < 2 


N^*{X,iy,-fd) ) \ ^ N{X,u,jd) j 


(27r)2erf (^/k^log]V(\^^^ 


Because of the symmetry of If the odd powers of (f — f*) integrate to zero. 
We proceed with the ~ g, (x, t* , 0* )-terms in (326) using the notation and 
results from the calculations (336)-(358) we write 


K,(f) 






§ix,f*,e* 


. (x,f*,G^ 

2 r,T,9j y ’ ’ 


0 


, , d 

(27r) 2 


2 T,T,ej y ^ ^ j 


/4>. A (x,T* ,e*)zi=-9*+ri{f* ,9*){T-f*) 

^3 '^3 


dz exp 


- r,{T\e*){f - f*)\ (f - ry. 


By the previous calculations leading up to (358) together with (333) we 
conclude 


[ . Kj{T)exp -l-N{X,i',-fd)Sl€l{f - f*y dr 

JT&If V ^ J 


(271)^ ^ ^ ^ 


n Pg({t*M 
, .... 


^(27r) 2 exp f-^T*(0*y) X 


(1 + 0 ( 0 ) 


1 3 


-T + tOO + , X 


N(X,i2,jd) V 4 ' ’ + ^ + 


kj- log N(X,iy,jd) '\ 2 

N^yX,v,-fd) j 


+ {2-k) 2 erf 0O log N{X, u, jd 
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We proceed with the ^ f*, 0*)-terms and write 


M,{f)= I de exp (0,-0:+ 

! /. ' ^ li l \ 


2=1 


+- 


4>, \ 1 




(f-r) 


(362) 


(277) 


n^=l^lsS^,r*,e* 




(277) 


/$ ^ (x,f*,0*)zi=-e*+fi{f*,e*){f-T*) 
3 ’ j 


( 1 
dz exp -- 


2 ^ J X 


ix,f*, e*)zj - fj{f*, e *)(f - f*) ) (f - f*). 


(363) 


By (334) and the previous calculations (336)-(361) we conclude 




< 


Mj{f) exp 


- f*f 


df 


/ \ 

(277) 2 




n pg{{t’X(), 


1(277)-! exp X 


(1 + C) 


-(r*)l(xii(i)-^^d*)+T*(0*y^ 


NiX,u,jd) 

kr loglV(A, u, 7 d)\ 2 


axAr*,0:){r*)-^O*x 


yV N>^*{X,iy,-fd) 

0 <kr < logiV(A,u,7rf) 


+ ( 277 ) 2 erf V kr log N{X, u, jd) 


( 364 ) 
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where we again approximated a -PG(')"term like the one in (340) by the 
constant Pg((' 7 ‘*) ^0*) over the the interval If. We then observe that the 
integrated contribution from the (ai, 0*, 'f*)-term will be zero because 
If is symmetric. Comparing the estimated integrated contributions from 
the third order terms in (358), (361), (364) we find that the expression in 
(358) has the leading order except for the contribution of terms of type 

NiXu'yd) fro™ (361). Adding these terms to the expression 

(358) and summing the result up over all indices 1 < j < d using the claim 
(411) we may now bound the total integrated error W{x,t*,6*) with 


w{r,e*) { 


/ \ <^~l~ 1 d 

{2tt) 2 r2 


-1 




n^G((r*)50:(l + C))| 

T*{6*y exp 


1 = 
d 


2 iV(A,u,7rf) + 


+ 


(271)- 


-Ciyuju — 1 | • |u — 2 | ( —n(r*, A 


n. 


d 

E 

i=i 


{T*)h* 

sgn{e*){l+ o{C)) 

Pg 

((r*)50*(l + C)) 


exp ( --T*{e*f ) X 


fcf logiV(A,u,7d )^ 2 

N'^r (A, jy, 7^) 


+ 2 + T*{e*f+ ^ 


A2 + t*(6'*)2' 


iV(A,u,7rf) 


(27r)2erf log iV(A, u, 7 ^)^ | 


(365) 


now assuming N (A, u, 7 ^) is large enough to make Uf y/kr logN/N <C 1 
and erf (kf log A^) ~ 1 we may write 


~ ^ ^(A,u,7d) 


exp ( -^T*{9* f 


+ -Cuixlix - 1| • |u - 2| 

o 


n{T*,x^ 


d 

E 

i=i 


u-l 


{T*) 2 e* sgn( 6 ')) 1 + 




(1 + C)x 

+ 0 


kf ^og N{X,u,'fd) 

(A,iy,7d) 


exp (^^T*{e*y^ 


> ( 366 ) 
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We may now express (299) on the form 


= w{f*,e*) + u{f*,e*) 

r 2 (27r) 2 exp 4 >(a 3 , f*, 0^ 

nti 

d 

(i+o) X 


{x,f\e*)-y 

* ^ ^ ^ - (X T* 0* 1 


d */ n*\2 


^iV(A,u,7rf) V 2 3 


Ciyuju — 1| • |u — 2| 


d (r*) 2 0 * sgn (9*) [1 + j + 

exp (^ir*( 0 ]) 2 ) 


i*\ /i , 2 ^ , ^ / rA:rlogN(A,!^,7d)l 2 

J ) (,1 + 7^ j + ^ ( [ N^.(A,.,7.) J 


(27r)-^ y T*{x\\{i) - \e*){x\\{j) - \e*) 
exp(ir* (0*)2 + (0;)2]) 

+0 

yN{\,u,-fd) exp (^lr*(0*)2) j j 


Now, utilizing (290), (292) we recognize the determinant of the Hessian 
H( X, f*,9*) of ^(a;, t* ,9*) with respect to parameters f,0 inside expression 
(367). Assuming A'(A, u, 7 ^) is large we may write 


12 i— 

h = exp (-4>(a., r, r)) \ , n Po ({rye* (1 + O) 

3 {^n{T*,X*y y exp(ir*(0|)2) 


( 27 r)-^ A r*(a;||(i) - ^0*){xy) - ^9*) 
^(A, u, 7 d) exp At* ( 6'*)2 + ( 0*)2 ^ 
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Using (46), and the relations \ J ^= \H\ and \J'^(j,F = 
where Jip,(p is the jacobi matrix of the transformations (t>{6i), 1 < 
i < d, and H{x,t,9) is the Hessian of ^{x,t,6), we finally get 


Ii = 


/ \ ^" 1 " 1 

(27r) — 


\H{x, T*, 6 *)\ 2 


a 

rfix\T*,0*)7rx{9*)llPG{{T*)h*il + C)) x 


2=1 




d 




+ 3 ( 1 + 0 - 




E 

i=i 


iT*)h* 

sgn (9*) 1 

(1 + 7 ^) 

exp 1 

Ur*i9* 

)^) 

1 


(271)-^ ^ r*{x\\{t) - i0:)(®||(j) - ^0*) 


U Id) 


exp 


(0*)2 + (0;)2 


(369) 


There are some observations to be remarked upon in connection with the 
result (369). 


( 1 ) Of and has to be chosen large enough to make the f-integrals 
//. (•) df in (324) and (325) converge, that is fj_ (•) df ^ f^(-) df. 

(2) We note that if 9* is the hard threshold estimator used by Donoho 

and Johnstone in [DJ94], we have for all nonzero 9* that {t*)^9* > 
\/2 log n and so: 1 > nf=i Pg > Pq (\/ 21 og n). 

(3) We note that the result in [RisOO] concerning IID signal in ad¬ 
ditive white gaussian noise, which in our setting coincides with a 
prior density equal the Fisher information, (which for IID gaussian 
likelihood is the uniform density in 0 *, 1 < i < d), yields asymp¬ 
totically for large n that infi<j<(i(r*) 2 |ai|| (i)| = y^logn -|- o(logn). 


We proceed to estimate the size of N{X, u, 7 ^). By equation (468), (36), (45) 
and the fact that the likelihood / is gaussian we have 

d 

6-\f^,4x, f*,9*) = -{n-d + 2)+Yl (®ll (^) - ^0 

2=1 


( 370 ) 
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By (328), (330) we have 

- (x,T*,0*) d. 

E g., = E (1+» (maAt-.o-))) x 

i=l " j i=l 

('(X||(i) -e-)(r-)4 - l(r- + Aj(e-)(r-)-Ie;) . 

Combining (370) and (371) we get 

{ d ^ (x f* 6*) 

CAA,v.fl-)-E ,.7.. 

= ^(n - (i + 2) + ^ (a:||(i) - 9 *) 0 * t * - ^ (a;||(i) - 
2=1 2=1 

+ E I AA9.-)(r*)-i()* - j(«*)"Aa(«*) - jAl(e*)(T*)-'(«-)" 

i=l ^ 

- E»(w,a(t-,«*)) ((iCilO - 9.‘)(T*)i - 1(T* + AA(9-))(T*)-i« 


(371) 


(372) 


2=1 ^ 
using (320) and rewriting a bit we get 

d 

(n — d 2) -\- 

2 


^^(A, u, 7d) = ^(n - d + 2) + ^ r* {x\\ (i) - 9*) {29* - x\\ (i)) + 

2=1 

+ E |-“Ai7(T*,dn(r*)id: - ^^^xAr*,0*)T*i9*f+ 

i=l 

-ld^(r*,0Dr*(9*fj 

-'^o Ax At*, 9*)) ({x\\{i) - 0*i){T*)^ - hr* + /\x{0*)){t*)-^ 9] 


2=1 


( 373 ) 


rewriting ^x,u{t* ,9*) we get 

1 A ^ 

u, 7d) = -(n - d + 2) + E (®|| (^) “ A) “ ®|| (0) + 

2=1 
d 

+ - 1 ) 

2=1 


12(r*,A=' 


{t*A9* 


v-l \ 

sgn(d*) - - 


A 5 0* 



132 


D. THE LAPLACE APPROXIMATION FORMULA FOR THE MARGINAL 


d 

-E 

i= 


i=l 


( 374 ) 


Now, it is reasonable to claim that the sum t* (a;|| (i) — 0*) x (26* — ®||(i)) 
is either positive, or failing that, very small in absolute value compared to 
^(n — d+2). By (309) we see that ,6*) = 0 and ii\^^{t* ,6*) < 0, when 

0 < p < 1. By (410) we have \^\^i,{t*, 6i)\ < < 1) when 0 < p < 2, 

therefore to leading order it suffices to consider the terms linear in , Gi) 

in the expression (373). In the case 0 < p < 1 we note that the |A20*|-terms 
contribute positively to the right hand side of expression (373) and because of 

the claim (410) the (^n(r*, A*)) ^ (t*) 26 ** sgn (0*)-terms are bounded 

in absolute value by Cl('r*)2 0*| for some positive number C < 1) and since 
the 9i are modelled as zero mean parameters, we may expect a cancellation 
effect to make the number value of the sum of d such terms small compared 
to d. Alternatively: t*{9*Y > 1 and (^fl(A*,r*)) ^ 1 . in the case 

1 < p < 2 we observe that 

\\z\\ij < K(d,v)\\z\\ 2 , Vz G 


p > 1 


where 


Ar((i, p) sup -7—^ = max ("l, rfi' 2 ^ p > 1. 

Ilz|l2=i ll^lb ^ ^ 

Then, using the estimator A* for A given in (401) we may write 
d 


(375) 


(376) 




i=l 


<d, VpG(1,2). (377) 

Inserting our results from the discussion above in (374) we may write 

f d {x,T*,e*) 

iV(A*,p,7d) =<5-Vi ^br(a^,r*,r)-E 




d 


>^{n-d + 2)- ^C^p|p- l|d + E^* (*||(0 -^i) (20* -®||W) 

i=l 
d 


O '^9i,AT*,9*)T*{9*f , when 1 < p < 2, 


(378) 


vi=l 
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=-{n-d + 2)+ '^T* {x\\{i)-e*) [29* -®||(i)) 
i=l 

- o , when 0 < u < 1. (379) 

We note that the result in (378) also holds when using the estimator A* 
given in (403). Alternatively, we may just evaluate the expression (374) for 
a given dataset x, estimator A* and corresponding model as indexed by 7 ^ 
to get the exact value of A'(A*, u, 7 ^). Now we consider the integral I 2 in 
(300). It is difficult to evaluate as we have no natural center about which 
to do a Taylor expansion. Instead we will show that I 2 ^ Ii, by an indirect 
approach. Since we will simply bring Ii and I 2 onto forms that are easily 
compared, we will keep to the coordinates 6, r for simplicity. We need to 


compare 


'ts/t 


dO exp (—<I>(£C, r, 0)) |F(0, r)| 2 


dO WoriXi- 9i)Tlx{0i 


+ i=l 


dO exp (—4>(a;, r, 0)) |F(0, r)| 2 


dO '^Orixi- 9i)T:x{9i) 


+ 2=1 


where we have defined 


9r{x) = ^=exp --TX 


Now consider the integral 

Q 2 {x,t)'^= f TTx{9)gr {x - 9) d9 (382) 

J —00 

changing variables u A 2 0 , recalling the definition of SNR Q{X, r) in (306), 
we get 


Q 2 ix,T) = T^^'^ Tri{u)gi u-T 2 x] du 
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< sup 7ri(w) (—r2(A,T)] ^Pg(—t^x 

i«eR_ I \a / V 


We continue with 


Qi{x,t)=^ [ 7rx{0)gr{x - 9) de 
Jo 


= y 7ri{u)gi u — T^x] du. 


We define 


uo(r2x) r2x 


and we then write 


(•UO 


(5i(x,r) = / 7ri(u)5ri ( ( ^f7(A,r))" (u - uo) ) du 


'0 


n, 


n. 


+ T^/^y 7ri(u)5(i ^^0(A,r)) ^ (u - uo) ) du. 


Now we have 


fUO 


n. 


I 7ri(u)5(i ^-0(A,r)) (u - uq) ) du 

/•UO 


n. 




Taylor expanding 7ri{u) to first order about uq yields 


'UQ 


'^i{u)9i ( ( '^)) “ ^o) ) du 


'UQ 


du (7ri(uo) + 7ri(.f 

UQ ){u - uo)) gi ( ('^y ijj'- uo) 


= ^7ri(uo) (yn{X,T 


+ J '^iiCuo){u-uo)9i{yy^^{X,Ty (u-uo)] du 


( 383 ) 


(384) 

(385) 


(386) 


(387) 


(388) 


(389) 
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where ^uq is some number such that ^ {uo,u). Using the bound (304) 
we may write 


sup |7ri(C„j| < sup 7ri(t) • <^ 

0<^UQ<U t&{uo,Oo) I 


C,, ■ V ■ u, 


u-1 




,Q if 0 < u < 1 

Cu ■ V ■ u'^~^ if 1 < u < 2. 


(390) 


We may then in the case 0 < u < 1 write 


' UO 


AiCuo)iu - uo)gi ( (30(A,r))" {u - uq) ) du 


<Cu-u- sup 7ri(t)-Uo M (u-uo)5i { (u-uo) ) du 


te(tio,oc)) 


'UQ 


n. 


d 


= Cy ■ V ■ U, 
CyV 


u-l 


/ Tl \ — ^ 1 

0 sup 7ri(t) • (-f2(A,T)) (27r)“2 

t£{uo,oo) ^ ^ ^ 


1 \ ^ 1 / Tl 

sup 7ri(t) • ( r 2 a;) 


(27r) 2 te(no ,CX)) 

In the case 1 < < 2 we have 


£±i 

2 


(391) 


'UQ 


Ai^uo)iu - uo)gi ( (^n{X,T))" (u-uo) ) du 


< Cy ■ u ■ sup 7ri(t) / {u — uq)u'^ I ( —r2(A, r)) (u — uq) ) du. 


te(tto,oo) 


' UQ 


n. 


d 


(392) 


Now, by maximizing the integrand in (392) with respect to u G (1)2) for 
each of the cases uq < 1 and uq > 1, we find that the expression (392) may 
be bounded from above for all uq > 0 by 


1 < u < 2 ^ 
CyV 


fUQ 


du 7r[{Cuo){u - uo)gi ( (^fl(A,r)j^ {u - uq) 


< 


(27r) 2 t&{uo ,oo) 


sup vri(t) • ( —fl(A, r 


n , 


d 


1 + Uq + 


n 




2 \d 


n(A, r 


- 1/2 


(393) 


Then by the estimates (388), (389), (391), (393) we may bound Qi{x^t) 
from below as follows 


Qi{x,t) > -T 


i.i/2 


n(A, r 


sup 7ri(t) 

te(tio,oo) 


7ri(uo) 


SUPt6(„o 

,oo) 7 ri(t) 


+erf(rix) _ ‘^lyiAx) 

SUP;:g(tjQ^Oo) ^1 (U (271)2 


( 394 ) 
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where 


LJt-2x) 


(i + TU(5ri(A,T))‘ 


if 0 < u < 1 
if 1 < u < 2. 


( 395 ) 


and uo(t 2 x) = (^Q(A,t)) ^ T 2 X and we have implicitely made the assump¬ 
tion: 7ri(no)/(sup^g]g7ri(t)) > 2Cui'{27r)~ 2 Ly{T 2 x). We may then by (383), 
(386), (394), (395) write 


Q2{x,t) 

Qi{x,t) 


< 


2Pg (^-r2x^ suptgR_ 7ri(t) 


vri(no) 


1 I A™b6(o,^o)’"lW 20.:^ suptg(„Q,^)7ri(l) 1 

i + ert {T2X) ^ L^[t2x} 


(396) 


We note that if tt\{6) is a monotone decreasing function of \9\ we may write 
2Pg (-t^x) 7ri(0)/7ri 


Q2(a:,r) 

Qi{x,t) 


< 




1 -|- erf (t2x) — LyiT2x) 


(397) 


We may write 

h _ h + h — h 
h ~ 


= -1 + 


/ 1+/2 


= -1 + 


h h 

Ir&uULi {Qi{x\\{i),T) + Q 2 {x\\{i),T)) dr 
Ireir ni=i(3i(®||(i),^) dr 


using the integral mean value theorem we get 

l^rinf=i (<5i(®i|(^),n) + Q2(®||(i),n)) 


= -1 + 


= -1 + 


l^rl Y\j = lQl{x\\{j),T 2 ) 

ntiQ.(x||W.n)(l + ggg;^) 

n?=i<5i(*ii(j)>^2) 


, for some ri, r 2 G It 


(398) 


Now, if I It I is chosen small enough, then ri ~ r 2 , and using the bounds on 
Qi{x^t) and Q 2 {x,t) calculated above, we may write (398) as 


h 

h 


-i+n(i+ 


i=l 


Q2iX\\{l),Ti) 

Ql{x\\(i),Ti) 


( 399 ) 


<-i+n{i+ 


2=1 
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2Pg 





SUPt6K_ 7ri(t)/7ri 



1 , tr 1/2 ‘6(o.“o(u'''^-||W)) 


TTl{t) 


2C^u 

(2vr)2 


Lu{T^X\\{i)) 


^^Pfe(no ,oo ) ^l(l) 

(400) 


where we have by (345) that 


ti,T2 e It C 


exp 


/ log^ jV(A,u,7rf) \ 2 

V /\^(A,u,7d) J 


T* exp 


/ log^ Ai(A,u,7d) \ " \ 

V /v(A,u,7d) y j ■ 


with Ai(A,u, 7 rf) as given in (378) and (379). 


We did begin the proof under the assumption that 6 * G But when 
considering the calculations leading to the expression in (324), we see that if 
0 * < 0 for some 1 < f < d, then by changing the domain of integration from 

M+ to M_ on the 0j-axis in (299) and correspondingly changing the domain 
of integration for I 2 in (300) so that the union of integration domains in 
Ii and I 2 is what we get in formula (369) is simply that 6j changes to 
—6* inside the PG(')-expression. Thus, if we replace 9* by \9*\ inside the 
^(^(y-expression in (369), and likewise replace x^^{i) by |®||(/)| inside the 
^(^(y-expressions in (400), we see that our proof of the formulas (369) and 
(400) is invariant of sign changes on a;||(i) and 9*. 

There remains one question that need to be answered before the proof 
can be said to be complete, that is the problem of estimating the parameter 
A. Since A“^/^ is the second order moment of the prior density Trx{0), we 
could simply define 


i=l 

This would lead to 
d— 

n{X*,T*)= ■ 




* '\2 


n- 


n- 


= -E 

n ^ 

1=1 


t*{9^ 


* a2 


Alternatively 


1 (tef J 1 

1=1 


- -T 


(401) 


(402) 


(403) 
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leading to 

J7(A*,r*) = ^ = 1 - 4 • (404) 

Another way to proceed which is more in line with the philosophy of the 
MDL-principle would be to define 


A* = arg min;^>o{-log 


(405) 


That is we select the value of A minimizing the codelength of our dataset x 
given the model 'jd- We will generally prefer this maximum likelihood form of 
the moment estimator A* because of its codelength optimality and because it 
also simplifies computations. A special case of interest to us is ttx{ 6 ) belongs 
to the class of priors known as ” Generalized Gaussian Distributions” (GGD) 
which may be expressed on the form, [ML99] 


TTxie) 


urj{u) 

2T{l/u 


■A 2 exp (^—ri{vy 




! ^ del /^r(3/u) 

= (iwo 


(406) 


By (302)-(304) and because — log7rA(6*) is taken to be a symmetric, non¬ 
negative function of B with a decay limit as stated in (303), we see that 
the family of priors under consideration in this proof includes the family 
of GGD-distributions. In the special case of a GGD prior the Maximum 
Likelihood (ML) estimator A* for A is 


1 del { vri{v)'- 


[A1. V 

This leads to 


d 


Ei[« 

2=1 


*1 

i J^l 


d— 

L!(A*,r*)= 


urj[u 


n-r Xn^d 2 




2=1 


The second question is the claim (311) which is 
Ima,j.(t*,6'*)| < Cma,. < 1 < i < d. 


(407) 


(408) 


(409) 


Rewriting (309) yields 


\fixAr*,o*)\ = 


= - 1 | — 


n 


(A) 


(t*) 




u-2 


= Ciyiylu — 1| (■^^(A, t’' 


n. 




u-2 


, 0 < u < 2 


(410) 
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The claim (409) may then be expressed as 


sup < Cjyulu — 1| I —n(A,r 


l<i<d 
where 0 < u < 2. 


n. 


d 




u-2 


-V <1 


(411) 


It is now clear by considering (411) that (409) will be satisfied for ’’rea¬ 
sonable” values on the SNR n(A*,r*), the relative model size the tail 
parameter 0 < u < 2 and the tail constant C^, on the prior distribution tt\. 
The proof is now complete. 




APPENDIX E 


The marginal normalization 


We must address the problem of calculating the normalizing constant 
defined in (73). Clearly, will depend on our choice of the domain 
Y 3 z on which m^^{z) is normalized to be a density. We will take care in 
choosing this region Y as it will possibly have significant influence on the 
model selection principle we will end up with. The given data set x must 
be contained in the region Y. The geometry of the region Y is determined 
canonically by the model index vector 7 ^ and the form of the estimators t* 
and A*, as will be demonstrated below. We will concentrate on the generic 
case of priors vr^ defined in Theorem 4.1. 

Given a data set ic, a model as indexed by 7 ^ and assuming the conditions 
in Theorem 4.1. We then need to calculate 

^ (27r)"^ r f{z\T*,e*)7r{e*\X*) 

\It\ ■ \h\ Jz£Y \H{z,t*,6*)\^\'Yxx{0*,X*)\^ 

|n^G((r*)^|0:i{l + o(C)}^)| (412) 

where 6* = 0{z^^), t* = t*{z) are the MAP-estimators defined in (37) and 
A* is the estimator for the parameter A defined in (63). Using (36) and 
exploiting the orthogonal decomposition Y = Pj_ © 1)1 induced by the model 
7 rf, we may express the invariant MAP-estimator t* for the noise as 

77 ^ = ^2 + ll^ii -^"(^li)!!!) > e Tl, 2:11 G y||. (413) 

We evaluate the determinant of Hessian H{z, t* , 9*) of 4>(z, r, 9) by means 
of (36), (290), (291), (318), (320) and we get 


\H{z,t\9*)\ = 


n — d + 2 




1 - 


2t* 




. 2=1 
2 




(414) 
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where 


def 


- 1 ) (^ 


n, 




v-2 


( 415 ) 


We note that the matrix H{z, t* ,6*) is singular for t* = 0. By (414), (413) 
and under the conditions in Theorem 4.1 we may write 


\H{z, T\e*)\ = ]^{n-d + 2){T*f-^ exp {d • o(C)) x 


1 - 


2t* 


\h-o*\\1 


n — d + 2 1 + 0 (C) 


(416) 


We note that by expression (416) the matrix H{z,t*,0*) also becomes 
singular when r*||z|| — 0*||2 increases from zero and becomes large enough. 
Inserting (413) into (38) and exploiting the orthogonal decomposition z = 
z_L + zu, we get 




/ , d+2 

(271) — 

^rl ■ Ih 



n — d + 2 


dz\\ dzj_ 


'ziieYil, 



X 


_ 7r(r|A*) _ 

H{z,T*,e*)\^ |Taa(6>*,A*)|5 




0=1 



Exploiting the spherical symmetry of Y± as induced by the form of the 
estimator t* in (413), we change to polar coordinates in z±, that is we set 


R 


2 


def 




||2 

Il2) 


Sk{r) = 


712 A: 

r(| + i) 


r 


fc-i 


(417) 

(418) 


where Sk{r) is the surface area of a /c-dimensional hyper sphere of radius r. 
We find it convenient to use instead of r as a integration variable, thus 
we make a change of variables r ^ which gives 


5fc(l)r'' ^ dr^ dr'^. 

We may then write 


^ exp 5„_rf(l) 

2 ( 271 )"^ • \h\ 

n — d — 2 

TT{e*\x*) (i?i)^^ 

H{z,T*,e*)\^ |Taa(6>*,A*)|5 



(iz|| dR\ (r*) 2 X 


nrG((r*)i|e*|{i 


^ 2=1 



(419) 


(420) 
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where J C K_|_ is an interval. We will find it convenient to change integration 
variables in the integral (420) from {R'j_,zT) to (r*, (6*)'^). We define 


y=\Rl,zl 

Using (36) we define the gradient vector Q as 




dot 


= 0 . 


(421) 


(422) 


cx.=(3 


By (422) the total differential of Q{y,(3) along jS = (3* may be expressed 
formally as 


0 = dQ = 


dQiy,P) 


dy 


dy + 


dQ{y,f3) 


( 3 = 13 * 


df3 


d(3* 


(423) 


(3=13* 


This yields the formal expression for Jacobian as 


dy _ 

( dQ{y,P) 

) 

dQ{y,f3) 

d/3* 

\ dy 

^=(3*) 

df3 


f3=f3* 


Now, we observe that 

dQ{y,P) 


d(3 


= H{z,T*,e* 


(424) 


(425) 


/3=/3* 


and 




dQ{y,f3) 


dy 


(3=13* 


O-l 0,2 


0 -T* 


0 


0 0 -T* 


ad \ 
0 
0 


\ 0 0 


(426) 


0 -T* ) 


where aj —(a;||(j) — Oj), 1 < j < d. The Jacobi-determinant for the 

change of variables y ^ (3* then becomes 


dy 


(dQ{y,l3) 

\ 

dQ{y,l3) 


d(3* 


V dy 

/3=(3*J 

dp 

(3=13* 


*\d 


= ( 2 (--) 


-1 


\H{z,T*,e*)\. 

Define the inverse function 0 ^^(q:) of q = 0*(z|j) by 


(427) 


def 




( 428 ) 
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assuming such an inverse exists (0* / 0, 1 <i < d). Using (413), (427) and 
the relation z\]^{G*) = 0*^(0*) , then (420) becomes 




exp {-^^) 


n—d+2 

^ ' Sn-d{l){n -d + 2)^2\Ir\-^\h\-^x 

2(27r)^^ 


-d+2 


'0*e©*, T*&j* 


dO* dr* {T*) — n{e*\\*)\H{z,T*,e*)\^ \^x\{0*,\*)\~^x 


1 - 


n — d + 2 


*|| 2 \ 2 


nPG((r*)5|0:|{l + o(C)}^ 


(429) 


^ 2 = 1 


where 0* C M‘^\ {0} is some set still to be chosen subject to the constraints 
of containing the MAP estimate 0* and minimizing the total codelength 
expression (106) while A* is constant on the boundary 30* of 0*. Also, 
J* C M_|_ is a bounded interval. Inserting (416) into (429) yields 




exp (-^^^) 5„_rf(l) „-d-i (d 

{n-d + 2) 2 exp(-o(C))x 


/— , ^ n — d — 2 

\/2(27r)^— 


• \h\ 

7r{e*\X* 


/ d9* dr* 

le*£e*,r*£j* |4'aa(^*, A *)|2 


1 - 


T*||Z||(0*) - 0*11^ 
n — d + 2 


1 - 


2 r*||z||(0*) - 0*y 

1 + o(C) n — d + 2 



l»,*l {1 + 0(0} j) 


(430) 


We will need bounds on ||z|| — 0*||2- Using the notation from Theorem 4.1 
we have 

7 r(0|A) = C • A 2 exp /(A 20 )^ 


for some constant C > 0. The MAP-estimator 9* is given by 


9* = arg min 


-t{x - 9y + /( A 20 ) 


which yields the solution 9* expressed by 




(431) 


( 432 ) 


e=e* 
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Using the bound on /' stated in (52) in Theorem 4.1 together with the 
expression (432), we get 




2u-2 


(433) 


i=l 


To bound the righthand side of (433) from above we will make use of the 
claim >1, \/ i e jd, V r G Ir in Theorem 4.1 together with the 

norm inequality relation for norms on 


||®||p < K{d,p)\\x\\ 2 , X G M“, 1 < p < oo 
where 

K{d,p)'^ sup -|Ap = max V 

||cc||o=i l|a;||2 V / 


(434) 

(435) 


First we consider the case 0 < < 1. Recalling the definition on the SNR 

(306) we write 


—r^ei 

r2 


2u-2 


2u-2 


w-, - o’wl < 

2=1 

= ic„V();n(A.r))-^X:|A« 

2=1 

< (^U(A,t)) d, Vi^G(0,l] 


1 ^2 2 f - a 

= -cy(-) E 


2u-2 


2=1 


(436) 


where in the last inequality in (436) we used that T(0*y > 1. In the case 
1 < 12 < 2 we may write 




2=1 


1 

u 

A2d; 



A2 1 ^ 

r2 


rh* 


u-2 


= -Cy(-Q(A,r}) 

2=1 

<^cyQQ(A,T)') 

2=1 

< yy QQ(A,T)y^ (R(d,i2)iiA5rii2) 


iy-2 


1 


T 

1 


= yy{^n(\r}) 


di 


n 


= -cy(-nix,T)) Vi2g(i,2) 


(437) 
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where we have made use of the expression (401) for the size of A. We will 
have to choose the regions 0* and J* of integration subject to the claim 
that the Theorem 4.1 is valid. Thus, we will have to ensure the Hessian 
H{z,t*,9*) is non-singular over the region of integration. Using (436), 
(437) we may now write 


r*\\z\\{9*) 

- 9*\ 



y 9* ee* 

, V t" 

' eIrCR+ 

(438) 

where 




HO = { 

V 

iz/2 

if 0 < 1/ < 1 
if 1 < 1/ < 2. 

(439) 


By combining (416) and (438) we get 


1 - 




detH{z,T*,9*) > ^(n — d + 2)(r*)'^ ^exp (—dC) x 

-h(u)' 


n — d + 2 


1-C 


which will always be a positive number if 

n — d + 2 1 — C 

and 0 < C < 1, and t* > 0. 


(440) 


(441) 


We observe that the integral (430) diverges in t* at infinity. The description 
length as given by — log decreases with decreasing The 

expression (415) tells us that ^ oo as r* ^ 0. Because of the 

claim (53) in Theorem 4.1 the left end of J* must not be ’’too near” zero, 
unless n\^y{T*^9*) = 0 which is the case for priors flat in 9. However, by 
equation (413) we see that r*(z) ^ oo as || 2;±||2 ^ 0 and ||z|| — 9*\\2 0 

and T*{z) is bounded from below by a positive number when z_l ^ x±_ and 
Z|| ^ a;||. First we discuss the case ||z|| — 0*(z||)||2 = 0. This means that the 
estimator 9* is the ML-estimator 0*(z||) = zy corresponding to the choice 
of a prior distribution ti\{9) which is uniform (flat) in 9 and is centered in 
the origin. This is the case discussed in [BRY98], [RisOl]. In this case we 
have C = 0 because this distribution is infinitely differentiable at the 

origin, the term nti Pg in (430) may be replaced by 1. We then 

have 




exp 


f _ n—d-l-2 ^ 


\/ 2 ( 2 ' 


- d + 2) —|/.|-Va 


1 - 1 . 


TT 


2 
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de* dr* 


Tr{e*\X* 


/0*e©*, r*eJ* 


|^aa(0 *,A*)|2 


(442) 


We continue with the case of priors non-flat in 6 and flat (constant) in r. 
By (438) we have the following bounds 


1 > 1 - 


> 1 - 


/T-* 11 112 \ 2 

^ ll^ii II 2 \ 


n — d + 2 
d 


1 - 


2 T*\\z.-G*f' 


n — d + 2 


1 -|- o(C) n — d + 2 

/ \ 


i — d — 2 

— h{v)\ 2 


2d clv^ {%n{\\T*)) 


1 - 


n — d + 2 


1-C 


(443) 


Using the integral version of the mean value theorem on the PG(-)-part of 
the integrand we may state the following bounds 

exp Sn-d{l) , , 

J.U I? if, (n - d + 2)^ exp -o(C) x 
v/2(27r)^— lUllUl V2 / 

d 


1 - 


d-2 
— h{v')\ 2 


2,,2(nc)(\* 


1 - 


2d Clv^ ,T*)) 


n — d + 2 


1-C 


|n^’G(K)ba<l{l + »(C))’)| 


de* dr* 


T:{e*\\* 


'0*e0*,T*eJ* 


|^ aa ( 0 UA *)|2 


— ^id 


< 


exp 


n—d+2 


) 5n-d(l) 


v/2(27r)^ \h\\Ir 


{n-d + 2) 2 exp -o(C) x 


d 


|^^’G((^)ha^l{l+»(0)i)| 


7r(0*|A* 


/ de* dr* -r 

’e*£e*,T*£jf | 4 ' aa ( 0 *, A*)|2 


(444) 


for some ^ G J* and some a G Q*. Now, applying the Stirling approximation 
[AS70] to the B-function in (418), we may write 


n — d 


+ 1 =r 


n — d + 2 


= (27r) 


n — d+1 

n-d + 2\ 2 


exp 


n — d + 2 


1 + 0 


1 


n — d + 2 


( 445 ) 
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and using the bounds; ^ = Pg{0) < Pg{x) < 1, V x 
bound as follows: 


n — d 
n — d + 2 


1 - 


exp 


d 


n — d—2 




1 - 


n — d + 2 
2d 


n — d + 2 

de* dr* 


0*&0*,T*&Jf 
Yd 

n — d 


1-C 

7r(6>*|A*) 

|^aa(0*,A*)|^ 


— ^id 


< 


n — d + 2 




de* dr* 


The result in Propostion 6.1 follows. 


> 0 , we may now 


7r(6>*|A*) 

|TAA(0TA*)|i 

(446) 



APPENDIX F 


The partial derivatives of l>(£c,f, 0) up to order 3 


Define 


and e* = cj>{e*,T). 


Using the independency of the parameters 6i, 1 < i < d and the functional 
relations r = '4’i^) 9i = (p{9i,f) as given in (39) we may write 




d^{f) 


df 


f=f* ,0=0* 


+ ^<^>eSx,T*,e* 


d4>{9i,f) 


2=1 


df 


T=T*,e=e* 


(447) 


4 >. {x,r,e*) = ^ed^,r*,e*) 


d(f){9k,f) 


d9k 


<^f^r{x,f*,e*) = <^rA^,T*,e*) 


( di’A) 


df 


+ <h^(£c,r*,0 

d 


d‘^'ip{f) 


df^ 

+ 2^cl>e„r(®,r*,r 


T=T*,e=e 

d(j)0i,f 


2=1 

d 


df 

d‘^<j){9i,f) 


T=r*^e=0* 


r=r* 0 = 0 * 


d'ipij) 


T=T*,ei=e* 


df 


T=T*^0=0* 


2=1 
d 


r=r* ,0=0* 

d(l)0i,f) 


2=1 


df 


T=f*,e= 0 * 


(448) 


(449) 
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d9k 


( 450 ) 


T=f*,e=0* 


d<j){6k,T 


d‘^(P{9, 


k,r) 


+ ^ek,T{x,T*,e*) 


d9k 

d<j){9k,T 


dfd9k 


f=T*,e=0* 


d9k 


T = f*,0=0* 


f=f*,6»=0* 


dr 


r=r* ^ 0 = 6 * 


d4>{9k,f) 


df 


(451) 


r=f*,e=e* 


4> 




d4>{9k,f) 


d9k 


d^(f){9k,T) 


T=T*,e=e* 


dfd9k 


T=f*,e=0 






( d(t){9k,T) 
dA 

d(t>{9k,f) 


dip{f) 


d9k 


T=T*,0=0* 


r=r*,0=0* 


df 


T = f* ,0=0* 


d(p{9k,f) 


df 


• (452) 


r=r*,0=0* 




+ <^r,eAx,T*,e* 


( dil){f) 

\ d(l){9kA) 

df 

T=T*,0=0*J d9k 


r=T* ,0=0* 


dA{f) 


df"^ 


d4>{9k,f) 


T=T*,0=0* 


+ 2^e,,r{x,T\e*) 


d‘^(j){9k,f) 


+ 24>e,,0,,,(a;,r*,r) 
+ <^e,{x,T\e*) 


d9kdf 

d(f>{9k,f 


d9k 

d'tp{f) 


T = T*,0=0* 


T=T*,ei=d* 


df 


r=r* ,0=0* 


df 

d^(l){9k,f 


d(J){9k,f) 


T=T*,0=0* 


d9k 


d'il){f) 


T = T*,0=0* 


df 


r=T* ,0=0 


d9kdf‘^ 


r=r* ,S=6* 


n*^ 

~^^0kfikAl'^ A ) Q^2 


d4>{9k,T) 


r=f*,0=0* 


d9k 


T=f*,0=0* 
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d4>{0k,T) 


d'^(p{0k,f) 




\r=r*,0=0^ 

\ 2 


d4'{0k,f) 


\t=t* ft=0* 


d(f){ek,T) 

dOk 


\t=t* ,6=0* 


\AA(“=A',e-) = <l'<I.AA(!t,r-,r) 




r=T* ,6=0* 




d'ip{f) 




f^'T '' '' c)'t‘^ 

Ul ut 




dip{f) 


\r=r* ,6=6* 




\t=t*^0=9* 


d(f>{0i,f) 

df 


\ r=r* ,6=6* 


+ <^rix,T*,e* 


+ <^rA^,r*X)^ 


<2/ I 

5^'0(f) I dip{f) 


\t=T*,0=0* 


\T=T*,0=e* 


sr^ ^ ( * a*\ 

+ Y.^r,eM.r\e*) 

i=l T=T*,0=0* 


dA0i,r) 

dr 


\t=t* ,6=6* 


+ 2Y,^g^,rA,T\e* 


dAiOi 


\r=r* ,9i=6* 

1 t I 


i=i T=T*,Si=e* 

’ t i 


dip(f) 

„ T=T*^e=0* 

dAjr) 

df"^ f=f*,0=0* 


+ 2^^e,,r,rA,r\e* 


dA^iA) ( dAA 

df . . \ df 

f=T*,ei=e* \ 

1 t 1 


\T=T*,e=e* 


i=i \ T=T*fi,=e* 


|f=f*,0=0* 


2 
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i.r 


2=1 

d 


df^ 


r=r* ,0=0* 


2=1 

d 

+ ^^e.Ai^,T*,e*) 
2=1 
d 

2=1 

d 

2=1 

d 


2=1 


d‘^(f){9i,f) 


df"^ 

f=f 

d‘^(j){9i,f) 


9f2 

T = ' 

d<j){9i,f) 


’ df 

r= 

n f d(p{9i,f)\ 

’ df 

1 

( d<p{9i, 


' 1 af 


d'tp{f) 


df 


r=r*,0=0* 


d(J){9i,f) 


df 


f=T*,0=0* 


d^(j){ei,f) 


r=r* ,0=0* 




T = f*,0=0* 


d'ip{f) 


T=T* ^0=6* 


r=f* ,0=0* 


df 


r=f'*‘ ,6=0* 


(455) 


Now we need to compute all nonzero partial derivatives up to third order of 
the parameter mappings ^p{f) and 4>{9i,f) to get the desired order estimates 
of the coefficients of Tg. Recall the definitions of V’('^) and 4>{6i,f) in (39) 
and let the dimensionless numbers a, 5n and be defined as 


a 


1/2 d|f 




1 def 

and Cd = 



and 5n 


def 



(456) 


where r and r are dimensionless positive real numbers. We claim 1 < d < n 
and 0 < f and 0 < f and 0 < <5^ < 1- We may then write 


dTp{f) 
~d^^' 
d‘^'ip{f) _ 
df^ ^ 
5^'0(f) 
df^ 

d<j){ek,f) 

dOk 

<90(4, t) 

df 


20(f) : 

= Sned'lpif) = SnedT. 

(457) 

“V('r) 

= = ^IddT. 

(458) 

“i0(f) 

= = €4^- 

(459) 

1 1 
T^'lp 2 

(f) = f^T~2 . 

(460) 


-ir 2 a 240 ^f") = 2 (f) 


— 2^n^d9k- 


(461) 
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d^4>{6k,T) l_l _l 1... l_l 

_^__ = --r 2 a 2 ^ 2(t) = --T^5n€d^ 2 (r) 
oBkOT ^ ^ 

= -^f^KedT~^. (462) 

^ \ = ^SlelBk- (463) 

= ^f5a-V^(T) = = ^fUlelr-^. (464) 

^ = -^f^a-Uk'il’-Hf) = -^fUlelek'ip-Hf) 

= -^^l4Gk- (465) 

We may now combine the results in (457)-(465) above with the calculated 
partial derivatives of <h(a;, f, 6) with respect to the parameters f and 0*, 1 < 
i < d in (447)-(455). We then get 

1 

^r{x,r,e*) = ^^ix,T*,e*)SnedT* - -6nedY.<^9,ix,T*,e*)9l (466) 




^rA^, f*,e*) = 4 >,,.(*, T\e*)6Ai{r*r + ^Ax, r*,e*)SleA* 

d d 

- ^I4r* Y1 ^s,A^, T*,9*Wi + j€4 T*,e*)e* 


1 




= --^ekA’'^*A*)r2Sned{T*) 2 

+ ^e,Ax,T*,e*)fAned{T*)-^-^<^e,,e,A,T*,9*)fUned{T*)-'^0l (470) 


= -4>e,,0,(®,r*,r)r(5„ed(r*) ^ 

+ ^ek,ek,T{x,T*,9*)A(^df - ^Snedf{T*)~^^ek,0k,dki^^'^*A*Wk- ( 471 ) 
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= ^r,T,eu{x,T*,e*)T^2{T*f2Slel 

+ \^e,{x,T\e*)5leyHT*)-'^ 

+ \^ekfiu{x,T* ,e*)5lelf^^ 

+ \^ek,ek,eu{x^T* ,e*)5lelf^{T*)-^{eif. ( 472 ) 

= ^eM,{x,T\e*)fi{T*)-i. ( 473 ) 


4>^,^,^(a;, f\e*) = 34>,,,(ai, r*, r )<5;)e^(r*)^ 

d 

+ 4>,,.,.(®, r^ r )<53e3(^*)3 _ ^ r )53e30*(r*)2 


i=l 


+ 4>,(®, T*,e*)5lelT* - 1 ^ T*,e*)5leye* 


i=l 


+ \Y.^rAA{^:r\e*)5ldiT*{e*f -\Y,^sA^^T\e*)5lele* 


i=l 

d 


i=l 


^ T*,e*)5y,{e*f + J E T\e*)sldMf. 


i=l 


i=l 


( 474 ) 



APPENDIX G 


Some fourth order partial derivatives of 


e*) 


Differentiating the expression (452) with respect to f we get 


/ dAhA) 

\ / d^A^kA) 

V dOk 

J y dAdOk 


f d<j){ek,’ 


V dOk 

f d(f>{ek,f 

V dOk 

f d4>{ek,T) 

V dOk 




k,r 


dfdOk 

dfd§k 
d‘^(j){9k,T) 
dOkdf 


4>. . . = 24>0, 0,(a;,r*,r) f 


+ 2^e„e,A^.T\e*) 

+ 2^e„e,A^,T\e*) 

+ ^e„e,,rA^,T\e*) 
+ ^eMkA^,r\e*) 
+ ‘^^0M,{x,T\e*) 

+ ^0,AA,r(®,T*,r 


d^A^k,' 

dfdOk 


dip A) 


df \ 

\) 

^ dA^k 

A) 

df 

K 


d'lpA)] 

li 


df 


/ dAOkA) 

y (dAA)] 

V dOk 

j y dA 



( dAOkA) 
V dOk 


d‘^AG 


k,T) 


dOkdf 


dAGk,r) 


dr 

dA&kA 

df 



dip{f) 

'df~ 
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+ ^e„eM{x,T*,e*) 


d(p{9k,f] 

dOk 


5(/>(4,'r) \ / d4>{ek,T 


where ] means evaluating the derivatives in t = f* ,0 = G*. By differentiat¬ 
ing (454) with respect to 9k we get 


By differentiating (454) with respect to f we get 


= 'f'».AAA(^.c,e-) 


9<Pi0kA) \ / 9<p(0kA) 






5V’(t')1A / d(J){9k,T) 


d^(f){9k,f) \ f d(p{9k,f) 
d9kdT ) I 89k 


In the case of a gaussian likelihood function, the expression (475) reduces to 

= ^ek,ek(.x,T* 


- $0, A,r(®, T*,e*)Sl€if + 4>0,,0,,0,(ai, T*,e*)6l€lf9UT*) ^ 

+ \^e,,eM,{x,T\en6yM0l)\r*r^ 

and the expression (477) reduces to 


{x,f*,e*) = --^0^^^e^^e^^0^{x,T*,e*)5nedT2{T*) ^Gk 




3^ 

2 ^Skfikfik 


(®,r*,0*)(5n€dr2(r*) 2- 


(479) 
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H. NUMERICAL RESULTS 


Table 1. The results from the test image Barbara, N = 
512 X 512, wavelet used: Symmlet 16. The first row in each 
entry in the table is the scaled RMSE measure in (213), the 
second row is the SNR of the reconstruction, see (212), the 
third row shows the proportion of nonzero wavelet coefficient 
estimates 9* as a fraction of the sample size N and the fourth 
row shows the estimated value of the GGD shape parameter 
u provided by the INMDL algorithm. 


SNR 

Risks brink 

SureShrink 

^(0.70) 

^ MAP 

7T-l(l-0) 

-^MAP 

NML 

INMDL 

1.0 dB 

24.3% 

13.1 dB 
0.0961% 

20.0% 

14.8 dB 
0.790% 

23.4% 
13.4 dB 
0.155% 

24.2% 
12.9 dB 
0.810% 

93.4% 
3.88 dB 
39.7% 

33.5% 
10.6 dB 
1.34% 
1.02 

5.0 dB 

34.0% 
14.2 dB 
0.188% 

28.5% 

15.8 dB 
1.64% 

32.1% 
14.8 dB 
1.52% 

35.6% 
13.9 dB 
18.8% 

81.1% 
7.60 dB 
19.9% 

37.8% 
13.5 dB 
1.35% 
0.913 

10.0 dB 

51.2% 
15.7 dB 
0.434% 

40.3% 

17.8 dB 
8.83% 

52.4% 
15.6 dB 
11.1% 

62.5% 
14.2 dB 
55.6% 

59.2% 
14.6 dB 
5.47% 

47.9% 
16.4 dB 
1.92% 
0.842 

15.0 dB 

72.6% 
17.7 dB 
1.19% 

49.4% 
21.1 dB 
15.9% 

67.7% 
18.4 dB 
27.7% 

79.4% 
17.1 dB 
77.1% 

59.8% 
19.5 dB 
4.94% 

58.6% 
19.6 dB 
3.20% 
0.824 

20.0 dB 

102% 

19.8 dB 
2.81% 

66.3% 
23.5 dB 
24.1% 

79.3% 
22.0 dB 
44.9% 

88.8% 
21.1 dB 
87.9% 

75.0% 
22.5 dB 
7.22% 

78.7% 
22.1 dB 
5.67% 
0.815 

25.0 dB 

122% 

23.3 dB 
5.14% 

75.3% 
27.4 dB 
30.0% 

85.6% 
26.4 dB 
57.1% 

93.4% 
25.6 dB 
92.9% 

89.6% 
25.9 dB 
9.4% 

95.1% 
25.4 dB 
8.10% 
0.791 

50.0 dB 

971% 

30.2 dB 
14.0% 

441% 

37.1 dB 
51.0% 

129% 
47.8 dB 
74.2% 

100% 
50.0 dB 
97.4% 

797% 
32.0 dB 
17.4% 

873% 
31.2 dB 
15.8% 
0.703 
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Table 2. The results from the test image Lena, N = 512 x 
512, wavelet used: Symmlet 16. The first row in each entry 
in the table is the scaled RMSE measure in (213), the second 
row is the SNR of the reconstruction, see (212), the third row 
shows the proportion of nonzero wavelet coefficient estimates 
9* as a fraction of the sample size N and the fourth row shows 
the estimated value of the GGD shape parameter u provided 
by the INMDL algorithm. 


SNR 

Risks brink 

SureShrink 

^(0.70) 

^ MAP 

7T-l(l-0) 

-^MAP 

NML 

INMDL 

1.0 dB 

17.9% 
15.9 dB 
0.0755% 

14.1% 

18.0 dB 
0.544 % 

17.1% 
16.2 dB 
0.122% 

17.9% 
15.7 dB 
0.764% 

93.0% 
3.91 dB 
38.9% 

31.8% 
11.2 dB 
1.17% 
1.02 

5.0 dB 

24.0% 
17.3 dB 
0.138% 

19.1% 

19.3 dB 
0.935% 

24.8% 
17.1 dB 
1.24% 

31.0% 
15.2 dB 
18.4% 

78.5% 
7.87 dB 
17.8% 

32.6% 
14.8 dB 
1.21% 
0.901 

10.0 dB 

34.2% 
19.3 dB 
0.300% 

26.1% 
21.6 dB 
2.39% 

46.1% 
16.8 dB 
10.0% 

60.5% 
14.5 dB 
55.3% 

48.1% 
16.4 dB 
3.52% 

35.7% 
18.9 dB 
1.50% 
0.807 

15.0 dB 

47.0% 
21.5 dB 
0.658% 

34.3% 
24.3 dB 
5.65% 

65.0% 
18.8 dB 
26.6% 

79.0% 
17.1 dB 
77.3% 

44.5% 
22.0 dB 
2.42% 

41.7% 
22.6 dB 
1.94% 
0.742 

20.0 dB 

62.8% 
24.0 dB 
1.34% 

45.9% 
26.7 dB 
10.6% 

77.6% 
22.2 dB 
44.2% 

88.8% 
21.1 dB 
88.3% 

53.1% 
25.5 dB 
3.05% 

52.4% 
25.6 dB 
2.89% 
0.700 

25.0 dB 

84.4% 
26.5 dB 
2.42% 

60.2% 
29.4 dB 
17.3% 

93.8% 
25.6 dB 
93.5% 

85.6% 
26.4 dB 
58.5% 

70.0% 
28.1 dB 
4.31% 

69.7% 
28.1 dB 
4.26% 
0.657 

50.0 dB 

738% 

32.6 dB 
8.9% 

381% 

38.4 dB 
48.5% 

110% 
49.2 dB 
82.3% 

99.3% 
50.1 dB 
98.6% 

705% 
33.0 dB 
9.73% 

689% 
33.2 dB 
10.2% 
0.588 
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Table 3. The results from the test image Baboon, N = 
512 X 512, wavelet used: Symmlet 16. The first row in each 
entry in the table is the scaled RMSE measure in (213), the 
second row is the SNR of the reconstruction, see (212), the 
third row shows the proportion of nonzero wavelet coefficient 
estimates 9* as a fraction of the sample size N and the fourth 
row shows the estimated value of the GGD shape parameter 
u provided by the INMDL algorithm. 


SNR 

Risks brink 

SureShrink 

^(0.70) 

^ MAP 

-‘-MAP 

NML 

INMDL 

1.0 dB 

25.1% 
12.8 dB 
0.0420% 

22.9% 

13.6 dB 
0.580% 

24.7% 
12.9 dB 
0.0801% 

24.9% 
12.8 dB 
0.731% 

93.7% 
3.86 dB 
40.3% 

35.3% 
10.1 dB 
1.31% 
1.04 

5.0 dB 

37.5% 
13.3 dB 
0.0946% 

34.3% 

14.1 dB 
1.39% 

37.2% 
13.4 dB 
1.45% 

38.9% 
13.1 dB 
19.1% 

83.2% 
7.40 dB 
21.2% 

42.9% 
12.3 dB 
1.29% 
0.95 

10.0 dB 

62.0% 
14.0 dB 
0.263% 

48.7% 

16.2 dB 
12.1% 

58.4% 
14.7 dB 
11.9% 

64.6% 
13.9 dB 
55.9% 

66.9% 
13.6 dB 
6.21% 

58.4% 
14.6 dB 
1.90% 
0.904 

15.0 dB 

97.9% 

15.1 dB 
0.929% 

66.1% 

18.5 dB 
23.6% 

74.7% 
17.5 dB 
29.7% 

80.9% 
16.9 dB 
77.2% 

81.1% 
16.8 dB 
5.93% 

83.8% 
16.5 dB 
3.31% 
0.912 

20.0 dB 

146% 

16.7 dB 
2.65% 

84.8% 
21.4 dB 
36.5% 

85.0% 
21.4 dB 
45.8% 

89.1% 
21.0 dB 
87.2% 

110% 
19.1 dB 
8.70% 

123% 
18.1 dB 
5.48% 
0.931 

25.0 dB 

214% 

18.3 dB 
5.27% 

105% 

24.5 dB 
48.0% 

92.3% 
25.7 dB 
56.7% 

93.3% 

25.6 

91.7% 

159% 
20.9 dB 
11.7% 

188% 

19.5dB 

5.27% 

0.954 

50.0 dB 

2788% 
21.1 dB 
10.7% 

1030% 
29.7 dB 
59.7% 

425% 
37.4 dB 
67.4% 

165% 
45.6 dB 
94.9% 

2271% 
22.9 dB 
15.4% 

2892% 
20.7 dB 
9.96% 
0.941 
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Table 4. The results from the test image Goldhill, N = 
512 X 512, wavelet used: Symmlet 16. The first row in each 
entry in the table is the scaled RMSE measure in (213), the 
second row is the SNR of the reconstruction, see (212), the 
third row shows the proportion of nonzero wavelet coefficient 
estimates 9* as a fraction of the sample size N and the fourth 
row shows the estimated value of the GGD shape parameter 
u provided by the INMDL algorithm. 


SNR 

Risks brink 

SureShrink 

^(0.70) 

^ MAP 

^ MAP 

NML 

INMDL 

1.0 dB 

21.2% 
14.3 dB 
0.0763% 

16.6% 

16.5 dB 
0.795% 

19.8% 
14.9 dB 
0.140% 

20.6% 
14.4 dB 
0.795% 

93.2% 
3.90 dB 
39.4% 

32.7% 
10.9 dB 
1.15% 
0.999 

5.0 dB 

28.5% 
15.8 dB 
0.161% 

23.0% 
17.7 dB 
1.45% 

28.3% 
15.9 dB 
1.34% 

33.1% 
14.6 dB 
18.6% 

79.9% 
7.73 dB 
18.9% 

35.2% 
14.1 dB 
1.26% 
0.891 

10.0 dB 

41.9% 
17.5 dB 
0.354% 

33.6% 
19.4 dB 
3.16% 

49.5% 
16.1 dB 
10.6% 

61.6% 
14.3 dB 
55.7% 

53.4% 
15.5 dB 
4.21% 

41.6% 
17.6 dB 
1.61% 
0.804 

15.0 dB 

61.6% 
19.2 dB 
0.80% 

45.8% 
21.7 dB 
9.63% 

68.1% 
18.4 dB 
28.3% 

79.9% 
17.0 dB 
77.7% 

56.5% 
20.0 dB 
3.29% 

54.2% 
20.3 dB 
2.34% 
0.758 

20.0 dB 

88.1% 
21.1 dB 
1.78% 

59.5% 
24.5 dB 
23.2% 

80.5% 
21.9 dB 
46.6% 

89.4% 
21.0 dB 
88.5% 

73.6% 
22.6 dB 
4.48% 

74.5% 
22.5 dB 
3.78% 
0.741 

25.0 dB 

124% 

23.1 dB 
3.56% 

76.1% 
27.4 dB 
35.4% 

88.1% 
26.10 dB 
60.9% 

94.2% 
25.5 dB 
93.6% 

104% 

24.7dB 

6.50% 

107% 
24.4 dB 
5.73% 
0.732 

50.0 dB 

1309% 
27.7 dB 
10.9% 

521% 

35.6 dB 
58.8% 

152% 

46.3dB 

78.6% 

103% 
49.8 dB 
97.9% 

1278% 
27.9 dB 
11.4% 

1352% 
27.4 dB 
10.3% 
0.710 
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Table 5. The results from the test image Bridge, N = 512 x 
512, wavelet used: Symmlet 16. The first row in each entry 
in the table is the scaled RMSE measure in (213), the second 
row is the SNR of the reconstruction, see (212), the third row 
shows the proportion of nonzero wavelet coefficient estimates 
9* as a fraction of the sample size N and the fourth row shows 
the estimated value of the GGD shape parameter u provided 
by the INMDL algorithm. 


SNR 

Risks brink 

SureShrink 

^(0.70) 

^ MAP 

7T-l(l-0) 

-‘-MAP 

NML 

INMDL 

1.0 dB 

27.2% 

12.1 dB 
0.0931% 

23.2% 

13.5 dB 
0.813% 

26.2% 
12.3 dB 
0.160% 

26.7% 
12.0 dB 
0.849% 

93.5% 
3.87 dB 
39.8% 

35.0% 
10.2 dB 
1.37% 
1.02 

5.0 dB 

38.5% 

13.1 dB 
0.200% 

31.0% 

15.0 dB 
2.90% 

36.2% 
13.7 dB 
1.62% 

38.3% 
13.2 dB 
18.9% 

82.3% 
7.48 dB 
24.5% 

41.5% 
12.6 dB 
1.42% 
0.928 

10.0 dB 

58.7% 
14.5 dB 
0.503% 

44.6% 

16.9 dB 
8.38% 

56.4% 
15.0 dB 
11.7% 

63.9% 
13.9 dB 
55.9% 

63.9% 
13.9 dB 
5.86% 

54.0% 
15.3 dB 
1.97% 
0.870 

15.0 dB 

88.3% 
16.0 dB 
1.23% 

62.0% 

19.1 dB 
19.8% 

73.4% 
17.7 dB 
30.1% 

80.9% 
16.9 dB 
77.9% 

75.6% 
17.4 dB 
5.46% 

77.1% 
17.2 dB 
3.13% 
0.848 

20.0 dB 

131% 

17.6 dB 
2.74% 

79.3% 
22.0 dB 
34.7% 

84.2% 
21.5 dB 
47.5% 

89.6% 
21.0 dB 
88.0% 

104% 
19.7 dB 
7.64 % 

113% 
18.9 dB 
5.06% 
0.841 

25.0 dB 

189% 

19.4 dB 
5.39% 

95.9 % 
25.3 dB 
49.8% 

91.0% 
25.8 dB 
60.2% 

93.9% 
25.5 dB 
92.9% 

152% 
21.4 dB 
10.3% 

173% 
20.2 dB 
7.06% 
0.844 

50.0 dB 

2126% 
23.4 dB 
14.3% 

644% 

33.8 dB 
68.2% 

244% 
42.3 dB 
74.9% 

119% 
48.5 dB 
96.7% 

2165% 
23.3 dB 
13.9% 

2660% 
21.5 dB 
9.30% 
0.829 
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Table 6. The results from the test image Boat, N = 512 x 
512, wavelet used: Symmlet 16. The first row in each entry 
in the table is the scaled RMSE measure in (213), the second 
row is the SNR of the reconstruction, see (212), the third row 
shows the proportion of nonzero wavelet coefficient estimates 
9* as a fraction of the sample size N and the fourth row shows 
the estimated value of the GGD shape parameter u provided 
by the INMDL algorithm. 


SNR 

Risks brink 

SureShrink 

^(0.70) 

^ MAP 

7T-l(l-0) 

-^MAP 

NML 

INMDL 

1.0 dB 

20.7% 
14.6 dB 
0.0687% 

17.4% 

16.1 dB 
0.713% 

19.8% 
14.9 dB 
0.124% 

20.4% 
14.5 dB 
0.792% 

93.2% 
3.90 dB 
39.3% 

33.3% 
10.8 dB 
1.18% 
1.02 

5.0 dB 

28.9% 
15.7 dB 
0.139% 

24.0% 

17.3 dB 
1.34% 

28.8% 
15.8 dB 
1.37% 

33.4% 
14.5 dB 
18.8% 

79.9% 
7.72 dB 
18.9% 

35.7% 
14.0 dB 
1.29% 
0.920 

10.0 dB 

43.4% 
17.2 dB 
0.333% 

32.7% 
19.7 dB 
4.74% 

49.3% 
16.2 dB 
10.8% 

61.8% 
14.3 dB 
55.9% 

53.6% 
15.5 dB 
4.43% 

41.5% 
17.6 dB 
1.71% 
0.842 

15.0 dB 

61.1% 
19.2 dB 
0.872% 

44.9% 
21.9 dB 
9.2% 

67.4% 
18.5 dB 
28.0% 

79.7% 

17.0dB 

77.7% 

54.6% 
20.3 dB 
3.56% 

51.9% 
20.7 dB 
2.55% 
0.812 

20.0 dB 

83.4% 
21.6 dB 
1.96% 

55.8% 
25.1 dB 
20.5% 

79.3% 
22.0 dB 
45.3% 

89.0% 
21.0 dB 
88.2% 

68.3% 
23.3 dB 
4.68% 

69.0% 
23.2 dB 
3.99% 
0.789 

25.0 dB 

115% 

23.7 dB 
3.59% 

73.5% 
27.7 dB 
26.8% 

86.5% 
26.3 dB 
58.1% 

93.6% 

25.6dB 

93.2% 

92.0% 
25.7 dB 
6.68% 

94.5% 
25.5 dB 
6.05% 
0.771 

50.0 dB 

1289% 
27.8 dB 
8.41% 

649% 

33.7 dB 
43.3% 

183% 
44.8 dB 
70.6% 

105% 
49.6 dB 
96.7% 

1070% 
29.4 dB 
11.7% 

1110% 
29.1 dB 
11.0% 
0.718 
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Table 7. The results from the test image Tank, N = 512 x 
512, wavelet used: Symmlet 16. The first row in each entry 
in the table is the scaled RMSE measure in (213), the second 
row is the SNR of the rconstruction, see (212), the third row 
shows the proportion of nonzero wavelet coefficient estimates 
6* as a fraction of the sample size N and the fourth row shows 
the estimated value of the GGD shape parameter u provided 
by the INMDL algorithm. 


SNR 

Risks brink 

SureShrink 

^(0.70) 

^ MAP 

zy-;(l-0) 

^ MAP 

NML 

INMDL 

1.0 dB 

19.4% 

15.1 dB 
0.0278% 

16.4% 

16.6 dB 
0.593% 

18.9% 
15.4 dB 
0.0618% 

19.0% 
15.3 dB 
0.667% 

93.2% 
3.89 dB 
39.2% 

33.6% 
10.7 dB 
1.15% 
1.04 

5.0 dB 

28.5% 
15.8 dB 
0.0698% 

23.4% 

17.5 dB 
1.14% 

29.1% 
15.7 dB 
1.23% 

33.4% 
14.5 dB 
18.3% 

79.8% 
7.73 dB 
18.5% 

36.7% 
13.8 dB 
1.27% 
0.951 

10.0 dB 

44.5% 
17.0 dB 
0.220% 

34.8% 

19.1 dB 
3.26% 

50.9% 
15.9 dB 
10.3% 

61.9% 
14.3 dB 
55.3% 

55.1% 
15.2 dB 
3.93% 

44.8% 
17.0 dB 
1.60% 
0.885 

15.0 dB 

67.6% 
18.3 dB 
0.633% 

49.8 % 
21.0 dB 
9.68% 

69.9% 
18.1 dB 
27.9% 

80.0% 
17.0 dB 
77.5% 

62.3% 
19.1 dB 
3.04% 

60.5% 
19.3 dB 
2.24% 
0.856 

20.0 dB 

101% 

19.8 dB 
1.52% 

68.7% 
23.2 dB 
24.0% 

82.4% 
21.7 dB 
46.4% 

89.4% 
21.0 dB 
88.4% 

87.9% 
21.1 dB 
4.00% 

88.9% 
21.0 dB 
3.41% 
0.850 

25.0 dB 

154% 

21.2 dB 
2.94% 

93.0% 
25.6 dB 
36.4% 

90.3% 
25.9 dB 
60.0% 

94.0% 
25.5 dB 
93.3% 

134% 
22.4 dB 
5.56% 

138% 
22.2 dB 
4.85% 
0.848 

50.0 dB 

2155% 
23.3 dB 
6.64% 

858% 

31.3 dB 
58.3% 

288% 
40.8 dB 
73.6% 

126% 
48.0 dB 
96.6% 

2039% 
23.8 dB 
7.86% 

2139% 
23.4 dB 
6.81% 
0.842 
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Table 8. The results from the test signal Blocks, N = 1024, 
wavelet used: Haar wavelet. The first row in each entry in 
the table is the scaled RMSE measure in (213), the second 
row is the SNR of the reconstruction, see (212), the third row 
shows the proportion of nonzero wavelet coefficient estimates 
6* as a fraction of the sample size N and the fourth row shows 
the fixed value of the GGD shape parameter u used by the 
INMDL algorithm. 


SNR 

Risks brink 

SureShrink 

^ MAP 

rT-i(l-O) 

-^MAP 

NML 

INMDL 

1.0 dB 

43.4% 
8.11 dB 
1.66% 

35.2% 
9.45 dB 
3.22% 

42.1% 
7.82 dB 
1.66% 

41.1% 
6.99 dB 
7.32% 

93.0% 
3.89 dB 
44.3% 

54.0% 
6.91 dB 
7.23% 
1.00 

5.0 dB 

42.0% 
12.5 dB 
2.83% 

39.4% 
12.7 dB 
5.47% 

42.4% 
12.2 dB 
3.61% 

48.6% 
10.8 dB 
34.3% 

78.9% 
7.81 dB 
21.1% 

52.9% 
10.7 dB 
7.23% 
1.00 

10.0 dB 

48.5% 
16.3 dB 
3.71% 

44.9% 
16.7 dB 
11.8% 

45.4% 
16.8 dB 
7.81% 

69.0% 
13.2 dB 
66.4% 

57.5% 
14.9 dB 
9.18% 

46.9% 
16.6 dB 
6.83% 
1.00 

15.0 dB 

39.8% 

23.0dB 

5.08% 

44.9% 
21.8 dB 
14.9% 

53.4% 
20.5 dB 
12.7% 

82.0% 
16.8 dB 
81.6% 

46.1% 
21.8 dB 
7.23% 

41.8% 
22.6 dB 
6.64% 
1.00 

20.0 dB 

30.5% 
30.3 dB 
5.96% 

46.3% 
26.6 dB 
15.0% 

62.9% 
24.0 dB 
20.6% 

89.4% 
21.0 dB 
90.7% 

38.9% 
28.2 dB 
6.64% 

38.5% 
28.3 dB 
6.64% 
1.00 

25.0 dB 

28.1% 
36.0 dB 
6.15% 

44.4% 
32.0 dB 
19.2% 

71.2% 
28.0 dB 
29.1% 

93.2% 
25.6 dB 
94.4% 

35.0% 
34.1 dB 
6.64% 

33.8% 
34.4 dB 
6.54% 
1.00 

50.0 dB 

21.9% 
63.2 dB 
5.96% 

50.2% 
56.0 dB 
8.39% 

80.4% 
51.9 dB 
42.7% 

98.3% 
50.2 dB 
99.4% 

25.9% 
61.7 dB 
6.05% 

25.9% 
61.7 dB 
6.05% 
1.00 
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H. NUMERICAL RESULTS 


Table 9. The results from the test signal Bumps, N = 1024, 
wavelet used: Symmlet 12. The first row in each entry in the 
table is the scaled RMSE measure in (213), the second row is 
the SNR of the reconstruction, see (212), the third row shows 
the proportion of nonzero wavelet coefficient estimates 6* as 
a fraction of the sample size N and the fourth row shows 
the fixed value of the GGD shape parameter u used by the 
INMDL algorithm. 


SNR 

Risks brink 

SureShrink 

Ay-t(0.5) 

-‘■MAP 

^(1-0) 

-‘■MAP 

NML 

INMDL 

1.0 dB 

44.7% 
7.59 dB 
1.76% 

37.2% 
8.58 dB 
5.57% 

47.5% 
6.01 dB 
1.56% 

56.6% 
2.15 dB 
2.15% 

91.0% 
3.92 dB 
38.9% 

55.2% 
6.57 dB 
7.81 % 
1.00 

5.0 dB 

55.3% 
9.83 dB 
2.34% 

40.1% 

12.2 dB 
7.62% 

52.3% 
9.97 dB 
2.73% 

49.2% 
9.94 dB 
15.0% 

76.9% 
7.94 dB 
20.1% 

55.0% 
10.2 dB 
7.42% 
1.00 

10.0 dB 

57.9% 
14.6 dB 
4.39% 

47.2% 

16.1dB 

8.79% 

46.7% 
16.4 dB 
6.35% 

61.7% 
14.0 dB 
53.8% 

63.1% 
14.1 dB 
11.1% 

52.8% 
15.5 dB 
7.62% 
1.00 

15.0 dB 

55.6% 
20.1 dB 
6.15% 

56.5% 
19.7 dB 
10.8% 

52.7% 
20.5 dB 
11.5% 

78.5% 
17.1 dB 
78.1% 

52.8% 
20.5 dB 
9.38% 

51.2% 
20.8 dB 
8.79% 
1.00 

20.0 dB 

53.8% 
25.4 dB 
8.20% 

56.0% 
24.9 dB 
15.7% 

63.0% 
24.0 dB 
18.8% 

87.9% 
21.1 dB 
89.6% 

56.2% 
25.0 dB 
9.38% 

54.8% 
25.2 dB 
9.18% 
1.00 

25.0 dB 

62.7% 

29.0dB 

9.87% 

63.1% 
28.9 dB 
18.6% 

71.7% 
27.9 dB 
28.0% 

92.7% 
25.7 dB 
93.8% 

57.1% 
29.9 dB 
10.9% 

57.4% 
29.8 dB 
10.5% 
1.00 

50.0 dB 

79.7% 
52.0 dB 
22.0% 

83.6% 
51.6 dB 
45.7% 

91.5% 
50.8 dB 
68.8% 

98.0% 
50.2 dB 
99.6% 

80.5% 
51.9 dB 
21.9% 

80.5% 
51.9 dB 
21.9% 
1.00 
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Table 10. The results from the test signal Heavisine, N = 
1024, wavelet used: Symmlet 12. The first row in each entry 
in the table is the scaled RMSE measure in (213), the second 
row is the SNR of the reconstruction, see (212), the third row 
shows the proportion of nonzero wavelet coefficient estimates 
9* as a fraction of the sample size N and the fourth row shows 
the fixed value of the GGD shape parameter u used by the 
INMDL algorithm. 


SNR 

Risks brink 

SureShrink 

7y-t(0.5) 

-‘■MAP 

7T-l(l-0) 

-‘■MAP 

NML 

INMDL 

1.0 dB 

14.7% 
18.0 dB 
0.684% 

15.6% 
16.7 dB 
0.977% 

19.2% 
15.2 dB 
0.586% 

24.3% 
11.8 dB 
1.27% 

89.9% 
4.28 dB 
36.9% 

50.6% 
7.81 dB 
5.86% 
1.00 

5.0 dB 

21.9% 
18.3 dB 
0.684% 

17.1% 
20.0 dB 
1.37% 

23.2% 
17.7 dB 
0.781% 

29.4% 
15.4 dB 
16.4% 

72.2% 
8.66 dB 
14.6% 

48.2% 
11.8 dB 
5.27% 
1.00 

10.0 dB 

22.5% 
23.1 dB 
1.07% 

23.5% 
22.4 dB 
1.56% 

30.3% 
20.5 dB 
2.44% 

57.4% 
14.9 dB 
55.0% 

49.4% 
16.3 dB 
4.69% 

46.7% 
16.8 dB 
4.49% 
1.00 

15.0 dB 

31.7% 
25.0 dB 
1.47% 

30.3% 
25.3 dB 
2.25% 

42.8% 
22.4 dB 
5.66% 

76.2% 
17.5 dB 
76.2% 

39.6% 
23.1 dB 
3.13% 

39.0% 
23.2 dB 
3.13% 
1.00 

20.0 dB 

36.1% 
28.9 dB 
1.95% 

35.0% 
29.0 dB 
4.30% 

54.2% 
25.4 dB 
12.5% 

86.3% 
21.3 dB 
88.0% 

35.4% 
29.1 dB 
2.93% 

36.4% 
28.8 dB 
3.42% 
1.00 

25.0 dB 

32.6% 
34.8 dB 
2.93% 

40.2% 
32.9 dB 
7.62% 

64.4% 
28.9 dB 
20.2% 

91.7% 
25.8 dB 
92.7% 

34.4% 
34.3 dB 
3.71% 

34.3% 
34.3 dB 
3.71% 
1.00 

50.0 dB 

38.4% 
58.3 dB 
7.03 % 

51.5% 
55.8 dB 
19.9% 

90.5% 
50.9 dB 
65.5% 

97.9% 
50.2 dB 
99.6% 

38.4% 
58.3 dB 
7.03% 

38.4% 
58.3 dB 
7.03% 
1.00 
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H. NUMERICAL RESULTS 


Table 11. The results from the test signal Doppler, N = 
1024, wavelet used: Symmlet 12. The first row in each entry 
in the table is the scaled RMSE measure in (213), the second 
row is the SNR of the reconstruction, see (212), the third row 
shows the proportion of nonzero wavelet coefficient estimates 
9* as a fraction of the sample size N and the fourth row shows 
the fixed value of the GGD shape parameter u used by the 
INMDL algorithm. 


SNR 

Risks brink 

SureShrink 

Ay-t(0.5) 

-‘■MAP 

^(1-0) 

-‘■MAP 

NML 

INMDL 

1.0 dB 

33.8% 
9.98 dB 
1.37% 

35.1% 
8.98 dB 
2.83% 

41.6% 
7.30 dB 
1.17% 

51.8% 
3.06 dB 
1.56% 

92.3% 
3.82 dB 
42.8% 

52.6% 
7.01 dB 
7.42% 
1.00 

5.0 dB 

38.4% 

13.1 dB 
1.95% 

39.5% 
12.4 dB 
3.81% 

37.5% 
13.0 dB 
2.34% 

41.1% 
11.7 dB 
15.0% 

76.0% 
7.98 dB 
18.2% 

50.6% 
11.0 dB 
6.64% 
1.00 

10.0 dB 

g 

42.7% 

17.3dB 

2.64% 

45.5% 
16.4 dB 
5.18% 

39.4% 
17.9 dB 
4.30% 

57.3% 
14.6 dB 
50.1% 

58.7% 
14.7 dB 
8.98% 

52.0% 
15.7 dB 
7.03% 
1.00 

15.0 dB 

42.3% 
22.4 dB 
4.00% 

53.2% 
20.3 dB 
9.57% 

48.8% 
21.2 dB 
8.11% 

75.7% 
17.4 dB 
74.3% 

52.9% 
20.5 dB 
7.03% 

49.4% 
21.1 dB 
6.25% 
1.00 

20.0 dB 

47.7% 
26.4 dB 
5.27% 

57.2% 
24.7 dB 
12.7% 

56.5% 
24.9 dB 
12.7% 

86.0% 
21.3 dB 
87.7% 

49.8% 
26.0 dB 
7.23% 

47.1% 
26.5 dB 
6.83% 
1.00 

25.0 dB 

57.8% 
29.8 dB 
6.25% 

64.0% 
28.8 dB 
14.9% 

64.2% 
28.8 dB 
21.0% 

91.6% 
25.8 dB 
92.1% 

50.1% 
31.0 dB 
8.00% 

49.0% 
31.2 dB 
7.81% 
1.00 

50.0 dB 

60.0% 
54.4 dB 
12.3% 

67.6% 
53.4 dB 
30.6% 

90.6% 
50.9 dB 
65.5% 

97.9% 
50.2 dB 
99.8% 

60.0% 
54.4 dB 
12.3% 

60.0% 
54.4 dB 
12.3% 
1.00 




























































